2,194 Matching Annotations
  1. Last 7 days
    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cells and even the generation of neurons from glial cells. This observation opens up the possibility of getting a handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults revealing a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so-called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story that is technically sound and could form the basis for an in-depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is unclear.

      The Reviewer correctly points out that it has been reported that traumatic brain injury trigger generation of neural stem cells. However, according to previous reports, those cells where quiescent Dpn+ neuroblast. We now report that already differentiated adult neuropil glia transdifferentiate into neurons. Which is a new mechanism not previously reported.

      We agree with the reviewer regarding the identity of VC neurons although according to the results of G-TRACE experiments the origin is clear, they originate from neuropil glia (i.e. Astrocyte-like glia and ensheathing glia). We will use a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons.

      Reviewer #2:

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineage-tracing tools, the authors interestingly observe the interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of differentiated glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such glia-derived neurogenesis is specifically favored following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Numerous experiments have been carried out in 7-day-old flies, showing that the observed plasticity is not due to residual developmental remodeling or a still immature VNC.

      By elegantly combining different genetic tools, the authors show glial divisions with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies Prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Weaknesses:

      Although the authors do use a variety of methods to show glial proliferation, the EdU data (Figure 1B) could be more informative (Figure 1B) by displaying images of non-injured animals and providing quantifications or the mention of these numbers based on results previously acquired in the system.

      We appreciate the Reviewer’s comment. We believed that adding images of non-injured animals did not add new information as we already quantified the increase of glial proliferation upon injury in Losada-Perez let al. 2021. Besides, the porpoise of this experiment was to figure out if dividing cells where Astrocyte-like glia rather than the number of dividing cells. Comparing independent experiments could be tricky but if we compare the quantifications of G2-M glia (repo>fly-Fucci) done in Losada-Perez et al 2021 (fig 1C) with the quantifications of G2-M neuropil glia done in this work (fig 1C) we can see that the numbers are comparable.

      The experiments relying on the FUCCI cell cycle reporter suggested considerable baseline proliferation for EGs and ALGs, but when using an independent method (Twin Spot MARCM), mitotic marking was only detected for ALGs. This discrepancy could be addressed by assessing the co-localization of the different glia subsets using the identified driver lines with mitotic markers such as PH3.

      In our understanding this discrepancy could be explained by the magnitude of proliferation. The lower proliferation rate of EG (as indicate the fly-fucci experiments) combining with the incomplete efficiency of MARCM clones induction reduces considerably the chances of finding EG MARCM clones. PH3 is a mitotic marker but it is also found in apoptotic cells (Kim and Park 2012. DOI: 10.1371/journal.pone.0044307), however we can do the suggested experiment and quantify the results.

      The data in Figure 1C would be more convincing in combination with images of the FUCCI Reporter as it can provide further information on the location and proportion of glia that enter the cell cycle versus the fraction that remains quiescent.

      We will add the suggested images.

      The analyses of inter-glia conversion in Figure 3 are complicated by the fact that Prospero RNAi is both used to suppress EG - to ALG conversion and as a marker to establish ALG nature. Clarifications if the GFP+ cells still expressed Pros or were classified as NP-like GFP cells are required here.

      As described in the text, Pros is a marker for ALG and the results suggest that Prospero expression is required for the EG to ALG transition. We will clarify these concepts in the text accordingly. In figure 3 we showed images of NP-like cells originated from EG that are prospero+, and therefore supporting the transdifferentiation from EG to ALG.

      The conclusion that ALG and EG glial cells can give rise to cells of neuronal lineage is based on glial lineage information (GFP+ cells from glial G-trace) and staining for the neuronal marker Elav. The use of other neuronal markers apart from Elav or morphological features would provide a more compelling case that GFP+ cells are mature neurons.

      We completely agree with the reviewer's observation regarding the identity of VC neurons. We will try to identify the identity of these cells using previously described antibodies to identify neuronal populations. We will also appreciate any suggestions regarding the antibodies we can use

      Although the text discusses in which contexts, glial plasticity is observed or increased upon injury, the figures are less clear regarding this aspect. A more systematic comparison of injured VNCs versus homeostatic conditions, combined with clear labelling of the injury area would facilitate the understanding of the panels.

      We appreciate the Reviewer’s observation. We will carefully check all figures in order to increase their clarity

      Context/Discussion

      The study finds that glia in the ventral cord of flies have latent neurogenic potential. Such observations have not been made regarding glia in the fly brain, where injury is reported to drive glial divisions or the proliferation of undifferentiated progenitor cells with neurogenic potential.

      Discussing this different strategy for cell replacement adopted by glia in the VNC and pointing out differences to other modes seems fascinating. Highlighting differences in the reactiveness of glia in the VNC compared to the brain also seems highly relevant as they may point to different properties to repair damage.

      Based on the assays employed, the study points to a significant amount of glial "identity" changes or interconversions, which is surprising under homeostatic conditions. The significance of this "baseline" plasticity remains undiscussed, although glia unarguably show extensive adaptations during nervous system development.

      It would be interesting to know if the "interconversion" of glia is determined by the needs in the tissue or would shift in the context of selective ablation/suppression of a glial type.

      We deeply appreciate the Reviewer’s enthusiasm on this subject, it is indeed fascinating. We made a reduced discussion in order to fit in the eLife Short report requirements but the specific condition that trigger glial interconversion are of great interest for us. To compromise EG or ALG viability and evaluate the behaviour of glial cells is of great interest for developmental biology and regeneration, but the precise scenario to develop these experiments is not well defined. In this report, we aim to reproduce an injury in Drosophila brain and this model should serve to analyze cellular behaviours. The scenario where we deplete on specific subpopulation of glial cells is conceptually attractive, but far away from the scope of this report.

      Reviewer #3:

      In this manuscript, Casas-Tintó et al. explore the role of glial cells in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a model organism and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury.

      This paper provides a new mechanism in regeneration and gives an understanding of the role of glial cells in the process.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Huang and colleagues explored the role of iron in bacterial therapy for cancer. Using proteomics, they revealed the upregulation of bacterial genes that uptake iron, and reasoned that such regulation is an adaptation to the iron-deficient tumor microenvironment. Logically, they engineered E. Coli strains with enhanced iron-uptake efficiency, and showed that these strains, together with iron scavengers, suppress tumor growth in a mouse model. Lastly, they reported the tumor suppression by IroA-E. Coli provides immunological memory via CD8+ T cells. In general, I find the findings in the manuscript novel and the evidence convincing.

      (1) Although the genetic and proteomic data are convincing, would it be possible to directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment? This will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the reviewer’s comment regarding the precise quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. To circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      (2) Related to 1, the experiment to study the synergistic effect of CDG and VLX600 (lines 139-175) is very nice and promising, but one flaw here is a lack of the measurement of iron concentration. Therefore, a possible explanation could be that CDG acts in another manner, unrelated to iron uptake, that synergizes with VLX600's function to deplete iron from cancer cells. Here, a direct measurement of iron concentration will show the effect of CDG on iron uptake, thus complementing the missing link.

      We appreciate the reviewer’s comment and would like to point the reviewer to our results in Figure S3, which shows that the expression of CDG enhances bacteria survival in the presence of LCN2 proteins, which reflects the competitive relationship between CDG and enterobactin for LCN2 proteins as previously shown by Li et al. [Nat Commun 6:8330, 2015]. We regret to inform the reviewer that direct measurement of iron concentration was attempted to no avail due to the limited sensitivity of iron detecting assays. We do acknowledge that CDG may exert different effects in addition to enhancing iron uptake, particularly the potentiation of the STING pathway. We pointed out such effect in Fig 2c that shows enhanced macrophage stimulation by the CDG-expressing bacteria. We would like to accentuate, however, that a primary objective of the experiment is to show that the manipulation of nutritional immunity for promoting anticancer bacterial therapy can be achieved by combining bacteria with iron chelator VLX600. The multifaceted effects of CDG prompted us to focus on IroA-E. coli in subsequent experiments to examine the role of nutritional immunity on bacterial therapy. We have updated the associated text to better convey our experimental design principle.

      Lines 250-268: Although statistically significant, I would recommend the authors characterize the CD8+ T cells a little more, as the mechanism now seems quite elusive. What signals or memories do CD8+ T cells acquire after IroA-E. Coli treatment to confer their long-term immunogenicity?

      We apologize for the overinterpretation of the immune memory response in our previous manuscript and appreciate the reviewer’s recommendation to further characterize CD8+ T cells post-IroA-E. coli treatment. Our findings, which show robust tumor inhibition in rechallenge studies, indicate establishment of anticancer adaptive immune responses. As the scope of the present work is aimed at demonstrating the value of engineered bacteria for overcoming nutritional immunity, expounding on the memory phenotypes of the resulting cellular immunity is beyond the scope of the study. We do acknowledge that our initial writing overextended our claims and have revised the manuscript accordingly. The revised manuscript highlights induction of anticancer adaptive immunity, attributable to CD8+ T cells, following the bacterial therapy.

      (3) Perhaps this goes beyond the scope of the current manuscript, but how broadly applicable is the observed iron-transport phenomenon in other tumor models? I would recommend the authors to either experimentally test it in another model or at least discuss this question.

      We highly appreciate the reviewer’s suggestion regarding the generalizability of the iron-transport phenomenon in diverse tumor models. To address this, we extended our investigations beyond the initial model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate the superiority of IroA-E. coli over WT bacteria in tumor inhibition. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments.

      Reviewer #2 (Public Review):

      Summary:

      The authors provide strong evidence that bacteria, such as E. coli, compete with tumor cells for iron resources and consequently reduce tumor growth. When sequestration between LCN2 and bacterobactin is blocked by upregulating CDG(DGC-E. coli) or salmochelin(IroA-E.coli), E. coli increase iron uptake from the tumor microenvironment (TME) and restrict iron availability for tumor cells. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity. Additionally, systemic delivery of IroA-E.coli shows a synergistic effect with chemotherapy reagent oxaliplatin to reduce tumor growth.

      Strengths:

      It is important to identify the iron-related crosstalk between E. coli and TME. Blocking lcn2-bacterobactin sequestration by different strategies consistently reduces tumor growth.

      Weaknesses:

      As engineered E.coli upregulate their function to uptake iron, they may increase the likelihood of escaping from nutritional immunity (LCN2 becomes insensitive to sequester iron from the bacteria). Would this raise the chance of developing sepsis? Do authors think that it is safe to administrate these engineered bacteria in mice or humans?

      We appreciate the reviewer’s comment on the safety evaluation of the iron-scavenging bacteria. To address the concern, we assessed the potential risk of sepsis development by measuring the bacterial burden and performing whole blood cell analyses following intravenous injection of the engineered bacteria. As illustrated in Figures 3k and 3l, our findings indicate that the administration of these engineered bacteria does not elevate the risk of sepsis. The blood cell analysis suggests that mice treated with the bacteria eventually return to baseline levels comparable to untreated mice, supporting the safety of this approach in our experimental models.

      Reviewer #3 (Public Review):

      Summary:

      Based on their observation that tumor has an iron-deficient microenvironment, and the assumption that nutritional immunity is important in bacteria-mediated tumor modulation, the authors postulate that manipulation of iron homeostasis can affect tumor growth. They show that iron chelation and engineered DGC-E. coli have synergistic effects on tumor growth suppression. Using engineered IroA-E. coli that presumably have more resistance to LCN2, they show improved tumor suppression and survival rate. They also conclude that the IroA-E. coli treated mice develop immunological memory, as they are resistant to repeat tumor injections, and these effects are mediated by CD8+ T cells. Finally, they show synergistic effects of IroA-E. coli and oxaliplatin in tumor suppression, which may have important clinical implications.

      Strengths:

      This paper uses straightforward in vitro and in vivo techniques to examine a specific and important question of nutritional immunity in bacteria-mediated tumor therapy. They are successful in showing that manipulation of iron regulation during nutritional immunity does affect the virulence of the bacteria, and in turn the tumor. These findings open future avenues of investigation, including the use of different bacteria, different delivery systems for therapeutics, and different tumor types.

      Weaknesses:

      • There is no discussion of the cancer type and why this cancer type was chosen. Colon cancer is not one of the more prominently studied cancer types for LCN2 activity. While this is a proof-of-concept paper, there should be some recognition of the potential different effects on different tumor types. For example, this model is dependent on significant LCN production, and different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type? For example, breast cancer aggressiveness has been shown to be influenced by FPN levels and labile iron pools.

      We highly appreciate the reviewer’s insightful comment on the varying LCN2 activities across different tumor types. In light of the reviewer’s suggestion, we extended our investigations beyond the initial colon cancer model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate that IroA-E. coli consistently outperforms WT bacteria in tumor inhibition. We acknowledge the reviewer’s comment regarding LCN2 being more prominently examined in breast cancer and have highlighted this aspect in the revised manuscript. For colon and melanoma cancers, several reports have pointed out the correlation of LCN2 expression and the aggressiveness of these cancers [Int J Cancer. 2021 Oct 1;149(7):1495-1511][Nat Cancer. 2023 Mar;4(3):401-418], albeit to a lesser extent. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments. The manuscript has been revised to reflect the reviewer’s insightful comment.

      • Are the effects on tumor suppression assumed to be from E. coli virulence, i.e. Does the higher number of bacteria result in increased immune-mediated tumor suppression? Or are the effects partially from iron status in the tumor cells and the TME?

      We appreciate the reviewer’s question regarding the therapeutic mechanism of IroA-E. coli. Bacterial therapy exerts its anticancer action through several different mechanisms, including bacterial virulence, nutrient and ecological competition, and immune stimulation. Decoupling one mechanism from another would be technically challenging and beyond the scope of the present work. With the objective of demonstrating that an iron-scavenging bacteria can elevate anticancer activity by circumventing nutritional immunity, we highlight our data in Fig. S6, which shows that IroA-E. coli administration resulted in higher bacterial colonization within solid tumors compared to WT-E. coli on Day 15. This increased bacterial presence supports our iron-scavenging bacteria design, and we highlight a few anticancer mechanisms mediated by the engineered bacteria. Firstly, as shown in Fig. 4d, IroA-E. coli is shown to induce an elevated iron stress response in tumor cells as the treated tumor cells show increased expression of transferrin receptors. Secondly, our experiments involving CD8+ T cell depletion indicates that the IroA-E. coli establishes a more robust anticancer CD8+ T cell response than WT bacteria. Both immune-mediated responses and alterations in iron status within the tumor microenvironment are demonstrated to contribute to the enhanced anticancer activity of IroA-E. coli in the present study.

      • If the effects are iron-related, could the authors provide some quantification of iron status in tumor cells and/or the TME? Could the proteomic data be queried for this data?

      We appreciate the reviewer’s query regarding the quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. Consequently, to circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      Reviewing Editor:

      The authors provide compelling technically sound evidence that bacteria, such as E. coli, can be engineered to sequester iron to potentially compete with tumor cells for iron resources and consequently reduce tumor growth. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity and a synergistic effect with chemotherapy reagent oxaliplatin is observed to reduce tumor growth. The following additional assessments are needed to fully evaluate the current work for completeness; please see individual reviews for further details.

      We appreciate the editor’s positive comment.

      (1) The premise is one of translation yet the authors have not demonstrated that manipulating bacteria to sequester iron does not provide a potential for sepsis or other evidence that this does not increase the competitiveness of bacteria relative to the host. Only tumor volume was provided rather than animal survival and cause of death, but bacterial virulence is enhanced including the possibility of septic demise. Alternatively, postulated by the authors, that tumor volume is decreased due to iron sequestration but they do not directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment. These important endpoints will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the editor’s comment and have added substantial data to support the translational potential of the iron-scavenging bacteria. In particular, we added evidence that the iron-scavenging bacteria does not increase the risk of sepsis (Fig. 3k, l), evidence of increased bacteria competitiveness and survival in tumor (Fig. S6), and iron-scavenging bacteria’s superior anticancer ability and survival benefit across 3 different tumor models (Fig. 3e-j; Fig. S5). While direct measurement of iron concentration in the tumor environment is technically difficult due to the challenge in differentiating Fe2+ and Fe3+ by available techniques, we added a colormetric CAS assay to demonstrate the iron-scavenging bacteria can more effectively utility Fe than WT bacteria in the presence of LCN2 (Fig. 3b). These results substantiate the translational relevance of the engineered bacteria.

      (2) There is no discussion of the cancer type and why this cancer type was chosen. If the current tumor modulation system is dependent on LCN2 activity, there would need to be some recognition that different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type?

      We appreciate the comment and added relevant text and citations describing clinical relevance of LCN2 expression associated with the tumor types used in the study (breast cancer, melanoma, and colon cancer). Elevated LCN2 has been associated with higher aggressiveness for all three cancer types.

      (3) To demonstrate long-term anti-cancer memory was established through enhancement of CD8+ T cell activity (Fig 5c), the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice since CD8+ T cells may play a role in tumor suppression regardless of whether or not iron regulation is being manipulated. It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We acknowledge that our prior writing may have overstated our claim on immunological memory. Our intention is to show that upon treatment and tumor eradication by iron-scavenging bacteria, adaptive immunity mediated by CD8 T cells can be elicited. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression. We have modified our text to reflect our intended message.

      Reviewer #1 (Recommendations For The Authors):

      All the figures seem to be in low resolution and pixelated. Please upload high-resolution ones.

      We have updated figures to high-resolution ones.

      Reviewer #2 (Recommendations For The Authors):

      Some specific comments towards experiments:

      (1) For Fig 2 f/ Fig 3f/ Fig 5d/Fig6c, the survival rate is based on the tumor volume (the mouse was considered dead when the tumor volume exceeded 1,500 mm3). Did the mice die from the experiment (how many from each group)? If it only reflects the tumor size, do these figures deliver the same information as the tumor growth figure?

      We appreciate the reviewer’s comment. The survival rate is indeed based on tumor volume, and we used a cutoff of 1500 mm3. No death event was observed prior to the tumors reaching 1500 mm3. Although the survival figures cover some of the information conveyed by the tumor volume tracking, the figures offer additional temporal resolution of tumor progression with the survival figures. Having both tumor volume and survival tracking are commonly adopted to depict tumor progression. We have the protocol regarding survival monitoring to the materials and method section.

      (2) Fig 3a, not sure if entE is a good negative control for this experiment. Neg. Ctrl should maintain its CFU/ml at a certain level regardless of Lcn2 conc. However, entE conc. is at 100 CUF/ml throughout the experiment suggesting there is no entE in media or if it is supersensitive to Lcn2 that bacteria die at the dose of 0.1nM?

      We appreciate the reviewer’s comment. The △entE-E. coli was indeed observed to be highly sensitive to LCN2. We included the control to highlight the competitive relationship between entE and LCN2 for iron chelation, which is previously reported in literature [Biometals 32, 453–467 (2019)].

      (3) Fig 4, the authors harvested bacteria from the tumor by centrifuging homogenized samples at different speeds. Internal controls confirming sample purity (positive for bacteria and negative for cells for panels a,b,c; or vice versa for panel d) may be necessary. This comment may also apply to samples from Fig 1.

      We acknowledge the reviewer’s concern and would like to point out that the proteomic analysis was performed using a highly cited protocol that provides reference and normalization standards for E. coli proteins [Mol Cell Proteomics. 2014 Sep; 13(9): 2513–2526]. The reference is cited in the Materials and Method section associated with the proteomic analysis.

      (4) To demonstrate long-term anti-caner memory was established through enhancement of CD8+ T cell activity, the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We apologize for overstating our claim in the previous manuscript draft.

      Minor suggestions:

      (1) Please include the tumor re-challenge experiment in the method section.

      The re-challenge experiment has been added to the method section as instructed.

      (2) Please cite others' and your previous work. E.g. line 281, 282, line 306-307.

      We have added the citations as instructed.

      (3) Line 448, BL21 is bacteria, not cells.

      We have made the correction accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • The authors postulate that IroA-E. coli is more potent than DGC-E. coli in resisting LCN2 activity, and that this potency is the cause of the increased tumor suppression of this engineered strain. If so, Fig 3a should include DGC-E. coli for direct comparison.

      We appreciate the reviewer for the comment and would like to clarify that we intended construct IroA-E. coli as a more specific iron-scavenging strategy, which can aide the discussion of nutritional immunity and minimize compounding factors from the immune-stimulatory effect of CDG. We have modified our text to clarify our stance.

      • The data refers to the effects of WT bacteria-mediated tumor suppression, e.g. Figure 3e shows that even WT bacteria have a significant suppressive effect on tumor growth. Could the authors provide background on what is known about the mechanism of this tumor suppression, outside of tumor targeting and engineerability? They only reference "immune system stimulation."

      We appreciate the reviewer’s comment and would like to refer the reviewer to our recently published article [Lim et al., EMBO Molecular Medicine 2024; DOI: 10.1038/s44321-023-00022-w], which shows that in addition to immune system stimulation, WT bacteria can also be perceived as an invading species in the tumor that can exert differential selective pressure against cancer cells. Competition for nutrient is highlighted as a major contribution to contain tumor growth. In fact, the nutrient competition that we observed in the prior article inspired the design of the iron scavenging bacteria towards overcoming nutritional immunity. We have cited this recently published article to the revised manuscript to enrich the background.

      • The authors claim that there is immunologic memory because of tumor resistance in re-challenged mice after IroA-E. coli treatment (Fig 5c). It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to highlight that the adaptive immunity stemmed from IroA-E. coli only, and we intend to build upon current literature that has reported CD8+ T cell elicitation by bacterial therapy. The IroA-E.coli is shown to enhance adaptive immunity. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression.

      • The authors claim that CD8+ T cells are mechanistically important in the effects of iron status manipulation in E. coli-mediated tumor suppression (Fig 5). In order to show this, it seems that Fig 5c should include WT-E. coli and WT-E. coli+CD8 ab groups, as it may be that CD8+ T cells play a role in tumor suppression regardless of whether or not iron regulation is being manipulated.

      We apologize for the confusion from our prior writing. We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to convey that CD8+ T cells are mechanistically important in the effects of iron status manipulation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the editorial team and reviewers for their continued contributions to improve our work.

      Below we have addressed the final recommendations to the authors

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I asked previously why the suppression depth should vary based on the contrast change speed. I now understand that the authors expect this variation from a working model based on neural adaptation (lines 274-277 and 809-820). I suggest the authors specify this prediction also on lines 473-479, where there is room for improved clarity (the words/phrases 'impact,' 'be sensitive to,' and 'covary' are non-directional).

      We have now specified this prediction to improve clarity:

      Line 475 – 486

      “In the context of the tCFS method, the steady increases and decreases in the target’s actual strength (i.e., its contrast) should, respectively, boost its emergence from suppression (bCFS) and facilitate its reversion to suppression (reCFS) as it competes against the mask. Whether construed as a consequence of neural adaptation or error signal, we surmise that these cycling state transitions defining suppression depth should be sensitive to the rate of contrast change of the monocular target. Specifically, the slower the contrast change, the greater the amount of accrued adaptation, which will contract the range between breakthrough and suppression thresholds according to an adapting reciprocal inhibition model. For fast contrast change, there will be less accrual of adaptation meaning that the range between breakthrough and suppression thresholds will exhibit less contraction. Expressed in operational terms, the depth of suppression should be positively related to the rate of target change. Experiment 3 tested this supposition using three rates of contrast change.”

      Line 108: 'By comparing the thresholds for a target to transition into (reCFS) and out of awareness (bCFS)'-are 'into' and 'out of' reversed?

      They were, thank you, these have now been corrected.

      Lines 696-698 read, 'Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings.' In the same paragraph, lines 716-171 read, 'Figure 3 shows that bCFS and reCFS thresholds are very similar for all image categories.' There is a statistically significant effect of category in these results; meanwhile, the differences among categories are arguably small. Which side do the authors intend to emphasize? Are the readers meant to interpret this as a glass-half-full, half-empty situation?

      We have now revised this paragraph. We emphasise that the small differences do not support ‘preferential processing’ of the magnitude that would be expected from category specific neural CRFs.

      From Line 702

      “Next we turn to another question raised about our conclusion concerning invariant depth of suppression. If a certain image type had overall lower bCFS and reCFS contrast thresholds relative to another image type (despite equivalent suppression depth), would that imply the former image enjoyed “preferential processing” relative to the latter? And, what would determine the differences in bCFS and reCFS thresholds? Figure 3 shows that polar patterns tend to emerge from suppression at slightly lower contrasts than do gratings and that polar patterns, once dominant, tend to maintain dominance to lower contrasts than do gratings and this happens even though the rate of contrast change is identical for both types of stimuli. But while rate of contrast change is identical, the neural responses to those contrast changes may not be the same: neural responses to changing contrast will depend on the neural contrast response functions (CRFs) of the cells responding to each of those two types of stimuli, where the CRF defines the relationship between neural response and stimulus contrast. CRFs rise monotonically with contrast and typically exhibit a steeply rising initial response as stimulus contrast rises from low to moderate values, followed by a reduced growth rate for higher contrasts. CRFs can vary in how steeply they rise and at what contrast they achieve half-max response. CRFs for neurons in mid-level vision areas such as V4 and FFA (which respond well to polar stimuli and faces, respectively) are generally steeper and shifted towards lower contrasts than CRFs for neurons in primary visual cortex (which respond well to gratings). Therefore, the effective strength of the contrast changes in our tCFS procedure will depend on the shape and position of the underlying CRF, an idea we develop in more detail in Supplementary Appendix 1, comparing the case of V1 and V4 CRFs. Interestingly, the comparison of V1 and V4 CRFs shows two interesting points: (i) that V4 CRFs should produce much lower bCFS and reCFS thresholds than V1 CRFs, and (ii) that V4 CRFs should produce much more suppression than V1 CRFs. Our data do not support either prediction: bCFS and reCFS thresholds for the polar shape are not ‘much lower’ than those for gratings (Fig. 3) and neither is there ‘much more’ suppression depth for the polar form. There is no room in these results to support the claim that certain images are special and receive “preferential processing” or processing outside of awareness. Instead, the similar data patterns for all image types is most parsimoniously explained by a single mechanism processing all images (see Appendix 1), although there are many other kinds of images still to be tested in tCFS and exceptions may yet be found. As a first step in exploring this idea, one could use standard psychophysical techniques (e.g., (Ling & Carrasco, 2006)) to derive CRFs for different categories of patterns and then measure suppression depth associated with those patterns using tCFS.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author response:

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we will shortly prepare a revised version of this paper. Intended changes to the revised manuscript are marked up in bold font in the detailed responses below, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”. We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”. The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we will amend our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”. We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such we will add a cautionary note to our paper. We will also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we will promote this validation which was in the supplementary figures into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”. We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD and metadynamics for path generation, and find this improvement again for PepT2 in this study. We will address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”. In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We will make our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we will provide the requested details on the CpHMD analysis. Furthermore, we will use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we will opt to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We are also changing the colours schemes of these plots in our revision to improve accessibility.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we will expand on our discussion of the reasoning behind employing a nonreactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we will make this clear in the appropriate figure captions in our revision.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the current version indicate explicitly that this may involve the substrate. We will make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We will make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We will discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way:

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This is currently figure S20, though in the revised version we will move this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we will acknowledge explicitly in revision. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of ns in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We will discuss such considerations in the revision.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.

      Strengths:

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.

      Weaknesses:

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this and denote it with question marks in the mechanistic overview we give in Figure 8, and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and will add details to the latter sentence in revision to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we will add more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      The reviewer is right to point out that the statement and Figure S3 as they stand do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, does indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we will include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We will also remake the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We will revise the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we will also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we will replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      (1) Given the low trial numbers, and the point of sequential vs clustered reactivation mentioned in the public review, it would be reassuring to see an additional sanity check demonstrating that future items that are currently not on-screen can be decoded with confidence, and if so, when in time the peak reactivation occurs. For example, the authors could show separately the decoding accuracy for near and far items in Fig. 5A, instead of plotting only the difference between them.

      We have now added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have also chosen to replace Figure 5B with the new figure as we think it provides more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median.”

      (2) The non-sequential reactivation analyses often use a time window of peak decodability, and it was not entirely clear to me what data this time window is determined on, e.g., was it determined based on all future reactivations irrespective of graph distance? This should be clarified in the methods.

      Thank you for raising this. We now clarify this in the relevant section to read: “First, we calculated a time point of interest by computing the peak probability estimate of decoders across all trials, i.e., the average probability for each timepoint of all trials (except previous onscreen items) of all distances, which is equivalent to the peak of the differential reactivation analysis”

      (3) Fig 4 shows evidence for forward and backward sequential reactivation, suggesting that both forward and backward replay peak at a lag of 40-50msec. It would be helpful if this counterintuitive finding could be picked up in the discussion, explaining how plausible it is, physiologically, to find forward and backward replay at the same lag, and whether this could be an artifact of the TDLM method.

      This is an important point and we agree that it appears counterintuitive. However, we would highlight this exact time range has been reported in previous studies, though t never for both forward and backward replay. We now include a discussion of this finding. The section now reads:

      “[… ] Even though we primarily focused on the mean sequenceness scores across time lags, there appears s to be a (non-significant) peak at 40-60 milliseconds. While simultaneous forward and backward replay is theoretically possible, we acknowledge that it is somewhat surprising and, given our paradigm, could relate to other factors such as autocorrelations (Liu, Dolan, et al., 2021).”

      (4) It is reported that participants with below 30% decoding accuracy are excluded from the main analyses. It would be helpful if the manuscript included very specific information about this exclusion, e.g., was the criterion established based on the localizer cross-validated data, the temporal generalisation to the cued item (Fig. 2), or only based on peak decodability of the future sequence items? If the latter, is it applied based on near or far reactivations, or both?

      We now clarify this point to include more specific information, which reads:

      “[…] Therefore, we decided a priori that participants with a peak decoding accuracy of below 30% would be excluded from the analysis (nine participants in all) as obtained from the cross-validation of localizer trials”

      (5) Regarding the low amount of data for the reactivation analysis, the manuscript should be explicit about the number of trials available for each participant. For example, Supplemental Fig. 1 could provide this information directly, rather than the proportion of excluded trials.

      We have adapted the plot in the supplement to show the absolute number of rejected epochs per participant, in addition to the ratio.

      (6) More generally, the supplements could include more detailed information in the legends.

      We agree and have added more extensive explanation of the plots in the supplement legends.

      (7) The choice of comparing the 2 nearest with all other future items in the clustered reactivation analysis should be better motivated, e.g., was this based on the Wimmer et al. (2020) study?

      We have added our motivation for taking the two nearest items and contrasting them with the items further away. The paragraph reads:

      “[…] We chose to combine the following two items for two reasons: First, this doubled the number of included trials; secondly, using this approach the number of trials for each category (“near” and “distant”) was more balanced. […]”

      Reviewer 2

      (1) Focus exclusively on retrieval data (and here just on the current image trials).

      If I understand correctly, you focus all your analyses (behavioural as well as MEG analyses) on retrieval data only and here just on the current image trials. I am surprised by that since I see some shortcomings due to that. These shortcomings can likely be addressed by including the learning data (and predecessor image trials) in your analyses.

      a) Number of trials: During each block, you presented each of the twelve edges once. During retrieval, participants then did one "single testing session block". Does that mean that all your results are based on max. 12 trials? Given that participants remembered, on average, 80% this means even fewer trials, i.e., 9-10 trials?

      This is correct and a limitation of the paper. However, while we used only correct trials for the reactivation analysis, the sequential analysis was conducted using all trials disregarding the response behaviour. To retain comparability with previous studies we mainly focused on data from after a consolidation phase. Nevertheless, despite the trial limitation we consider the results are robust and worth reporting. Additionally, based on the suggestion of the referee, we now include results from learning blocks (see below).

      b) Extend the behavioural and replay/reactivation analysis to predecessor images.

      Why do you restrict your analyses to the current image trials? Especially given that you have such a low trial number for your analyses, I was wondering why you did not include the predecessor trials (except the non-deterministic trials, like the zebra and the foot according to Figure 2B) as well.

      We agree it would be great to increase power by adding the predecessor images to the current image cue analysis, excluding the ambiguous trials, we did not do so as we considered the underlying retrieval processes of these trial types are not the same, i.e. cannot be simply combined. Nevertheless, we have performed the suggested analysis to check if it increases our power. We found, that the reactivation effect is robust and significant at the same time point of 220-230 ms. However, the effect size actually decreased: While before, peak differential reactivation was at 0.13, it is now at 0.07. This in fact makes conceptual sense. We suspect that the two processes that are elicited by showing a single cue and by showing a second, related, cue are distinct insofar as the predecessor image acts as a primer for the current image, potentially changing the time course/speed of retrieval. Given our concerns that the two processes are not actually the same we consider it important to avoid mixing these data.

      We have added a statement to the manuscript discussing this point. The section reads:

      “Note that we only included data from the current image cue, and not from the predecessor image cue, as we assume the retrieval processes differ and should not be concatenated.”

      c) Extend the behavioural and replay/reactivation analysis to learning trials.

      Similar to point 1b, why did you not include learning trials in your analyses?

      The advantage of including (correct and incorrect) learning trials has the advantage that you do not have to exclude 7 participants due to ceiling performance (100%).

      Further, you could actually test the hypothesis that you outline in your discussion: "This implies that there may be a switch from sequential replay to clustered reactivation corresponding to when learned material can be accessed simultaneously without interference." Accordingly, you would expect to see more replay (and less "clustered" reactivation) in the first learning blocks compared to retrieval (after the rest period).

      To track reactivation and replay over the course of learning is a great idea. We have given a lot of thought as to how to integrate these findings but have not found a satisfying solution. Thus, analysis of the learning data turned out to be quite tricky: We decided that each participant should perform as many blocks as necessary to reach at least 80% (with a limit of six and lower bound of two, see Supplement figure 4). Indeed, some participant learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). With the benefit of hindsight, we realise our design means that different blocks are not directly comparable between participants. In theory, we would expect that replay emerges in parallel with learning and then gradually changes to clustered reactivation as memory traces become consolidated/stronger. However, it is unclear when replay should emerge and when precisely a switch to clustered reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper.

      Nevertheless, to provide some insight into the learning process, and to see how consolidation impacts differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track processes on a block basis, it does offer potential (albeit limited) insight into the hypothesis we outline in the discussion.

      For reactivation, we see emergence of a clear increase, further strengthening the outlined hypothesis, however, for replay the evidence is less clear, as we do not know over how many learning blocks replay is expected.

      We calculated individual trajectories of how reactivation and replay changes from learning to retrieval and related these to performance. Indeed, we see an increase of reactivation is nominally associated with higher learning performance, while an increase in replay strength is associated with lower performance (both non-significant). However, due to the above-mentioned reasons we think it would premature to add this weak evidence to the paper.

      To mitigate problems of experiment design in relation to this question we are currently implementing a follow-study, where we aim to normalize the learning process across participants and index how replay/reactivation changes over the course of learning and after consolidation.

      We have added plots showing clustered reactivation sequential replay measures during learning (Figure 5D and Supplement 8)

      The added section(s) now read:

      “To provide greater detail on how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures across learning trials in contrast to retrieval trials. For all learning trials, for each participant, we calculated differential reactivation for the same time point we found significant in the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D). […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), due to experimental design features our data do not enable us to test for an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      d) Introduction (last paragraph): "We examined the relationship of graph learning to reactivation and replay in a task where participants learned a ..." If all your behavioural analyses are based on retrieval performance, I think that you do not investigate graph learning (since you exclusively focus the analyses on retrieving the graph structure). However, relating the graph learning performance and replay/reactivation activity during learning trials (i.e., during graph learning) to retrieval trials might be interesting but beyond the scope of this paper.

      We agree. We have changed the wording to be more accurate. Indeed, we do not examine graph learning but instead examine retrieval from a graph, after graph learning. The mentioned sentence now read

      “[…] relationship of retrieval from a learned graph structure to reactivation [...]”

      e) It is sometimes difficult to follow what phase of the experiment you refer to since you use the terms retrieval and test synonymously. Not a huge problem at all but maybe you want to stick to one term throughout the whole paper.

      Thank you for pointing this out. We have now adapted the manuscript to exclusively refer to “retrieval” and not to “test”.

      (2) Is your reactivation clustered?

      In Figure 5A, you compare the reactivation strength of the two items following the cue image (i.e., current image trials) with items further away on the graph. I do not completely understand why your results are evidence for clustered reactivation in contrast to replay.

      First, it would be interesting to see the reactivation of near vs. distant items before taking the difference (time course of item probabilities).

      (copied answer from response to Reviewer 1, as the same remark was raised)

      We have added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have chosen to replace Figure 5B with the new figure as we think that it offers more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median. .”

      Second, could it still be that the first item is reactivated before the second item? By averaging across both items, it becomes not apparent what the temporal courses of probabilities of both items look like (and whether they follow a sequential pattern). Additionally, the Gaussian smoothing kernel across the time dimension might diminish sequential reactivation and favour clustered reactivation. (In the manuscript, what does a Gaussian smoothing kernel of  = 1 refer to?). Could you please explain in more detail why you assume non-sequential clustered reactivation here and substantiate this with additional analyses?

      We apologise for the unclear description. Note the Gaussian kernel is in fact only used for the reactivation analysis and not the replay analysis, so any small temporal successions would have been picked up by the sequential analysis. We now clarify this in the respective section of the sequential analysis and also explain the parameter of delta= 1 in the reactivation analysis section. The paragraph now reads

      “[…] As input for the sequential analysis, we used the raw probabilities of the ten classifiers corresponding to the stimuli. [...]

      […] Therefore, to address this we applied a Gaussian smoothing kernel (using scipy.ndimage.gaussian_filter with the default parameter of σ=1 which corresponds approximately to taking the surrounding timesteps in both direction with the following weighting: current time step: 40%, ±1 step: 25%, ±2 step: 5%, ±3 step: 0.5%) [...]”

      (3) Replay and/or clustered reactivation?

      The relationship between the sequential forward replay, differential reactivation, and graph reactivation analysis is not really apparent. Wimmer et al. demonstrated that high performers show clustered reactivation rather than sequential reactivation. However, you did not differentiate in your differential reactivation analysis between high vs. low performers. (You point out in the discussion that this is due to a low number of low performers.)

      We agree that a split into high vs low performers would have been preferably for our analysis. However, there is one major obstacle that made us opt for a correlational analysis instead: We employed criteria learning, rendering a categorical grouping conceptually biased. Even though not all participants reached the criteria of 80%, our sample did not naturally split between high and low performers but was biased towards higher performance, leaving the groups uneven. The median performance was 83% (mean ~81%), with six of our subjects (~1/4th of included participant) having this exact performance. This makes a median or mean split difficult, as either binning assignment choice would strongly affect the results. We have added a limitations section in which we extensively discuss this shortcoming and reasoning for not performing a median split as in Wimmer et al (2020). The section now reads:

      “There are some limitations to our study, most of which originate from a suboptimal study design. [...], as we performed criteria learning, a sub-group analysis as in Wimmer et al., (2020) was not feasible, as median performance in our sample would have been 83% (mean 81%), with six participants exactly at that threshold. [...]”

      It might be worth trying to bring the analysis together, for example by comparing sequential forward replay and differential reactivation at the beginning of graph learning (when performance is low) vs. retrieval (when performance is high).

      Thank you for the suggestion to include the learning segments, which we think improves the paper quite substantially. However, analysis of the learning data turned out to be quite tricky> We had decided that each participant should perform as many blocks as necessary to reach at least 80% accuracy (with a limit of six and lower bound of two, see Supplement figure 4). Some participants learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). This in hindsight is an unfortunate design feature in relation to learning as it means different blocks are not directly comparable between participants.

      In theory, we would expect that replay emerges in parallel with learning and then gradually change to clustered reactivation, as memory traces get consolidated/stronger. However, it is unclear when replay would emerge and when the switch to reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper at all.

      Nevertheless, to give some insight into the learning process and to see how consolidation effects differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track measures of interest on a block basis, it gives some (albeit limited) insight into the hypothesis outlined in our discussion.

      For reactivation, we see a clear increase, further strengthening the outlined hypothesis, However, for replay the evidence is less obvious, potentially due to that fact that we do not know across how many learning blocks replay is to be expected.

      The added section(s) now read:

      “To examine how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures during learning trials in contrast to retrieval trials. For all learning trial, for each participant, we calculated differential reactivation for the time point we found significant during the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D).

      […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), our data does not enable us to show an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      Additionally, the main research question is not that clear to me. Based on the introduction, I thought the focus was on replay vs. clustered reactivation and high vs. low performance (which I think is really interesting). However, the title is more about reactivation strength and graph distance within cognitive maps. Are these two research questions related? And if so, how?

      We agree we need to be clearer on this point. We have added two sentences to the introduction, which should address this point. The section now reads:

      “[…] In particular, the question remains how the brain keeps track of graph distances for successful recall and whether the previously found difference between high and low performers also holds true within a more complex graph learning context.”

      (4) Learning the graph structure.

      I was wondering whether you have any behavioural measures to show that participants actually learn the graph structure (instead of just pairs or triplets of objects). For example, do you see that participants chose the distractor image that was closer to the target more frequently than the distractor image that was further away (close vs. distal target comparison)? It should be random at the beginning of learning but might become more biased towards the close target.

      Thanks, this is an excellent suggestion. Our analysis indeed shows that people take the near lure more often than the far lure in later blocks, while it is random in the first block.

      Nevertheless, we have decided to put these data into the supplement and reference it in the text. This is because analysis of the learning blocks is challenging and biased in general. Each participant had a different number of learning blocks based on their learning rate, and this makes it difficult to compare learning across participants. We have tried our best to accommodate and explain these difficulties in the figure legend. Nevertheless, we thank the referee for guidance here and this analysis indeed provides further evidence that participants learned the actual graph structure.

      The added section reads

      “Additionally, we have included an analysis showing how wrong answers participants provided were random in the first block and biased towards closer graph nodes in later blocks. This is consistent with participants actually learning the underlying graph structure as opposed to independent triplets (see figure and legend of Supplement 6 for details).”

      (5) Minor comments

      a) "Replay analysis relies on a successive detection of stimuli where the chance of detection exponentially decreases with each step (e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting the replay event). " Could you explain in more detail why 30% is a good threshold then?

      Thank you. We have further clarified the section. As we are working mainly with probabilities, it is useful to keep in mind that accuracy is a class metric that only provides a rough estimate of classifier ability. Alternatively, something like a Top-3-Accuracy would be preferable, but also slightly silly in the context of 10 classes.

      Nevertheless, subtle changes in probability estimates are present and can be picked up by the methods we employ. Therefore, the 30% is a rough lower bound and decided based on pilot data that showed that clean MEG data from attentive participants can usually reach this threshold. The section now reads:

      “(e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting a replay event). However, one needs to bear in mind that accuracy is a “winnertakes-all” metric indicating whether the top choice also has the highest probability, disregarding subtle, relative changes in assigned probability. As the methods used in this analysis are performed on probability estimates and not class labels, one can expect that the 30% are a rough lower bound and that the actual sensitivity within the analysis will be higher. Additionally, based on pilot data, we found that attentive participants were able to reach 30% decodability, allowing us to use decodability as a data quality check. “

      b) Could you make explicit how your decoders were designed? Especially given that you added null data, did you train individual decoders for one class vs. all other classes (n = 9 + null data) or one class vs. null data?

      We added detail to the decoder training. The section now reads

      “Decoders were trained using a one-vs-all approach, which means that for each class, a separate classifier was trained using positive examples (target class) and negative examples (all other classes) plus null examples (data from before stimulus presentation, see below). In detail, null data was.”

      c) Why did you choose a ratio of 1:2 for your null data?

      Our choice for using a higher ratio was based upon previous publications reporting better sensitivity of TDLM using higher ratios, as spatial sensor correlations are decreasing. Nevertheless, this choice was not well investigated beforehand. We have added more information to this to the manuscript

      d) You could think about putting the questionnaire results into the supplement if they are sanity checks.

      We have added the questionnaire results. However, due to the size of the tables, we have decided to add them as excel files into the supplementary files of the code repository. We have mentioned the existence file in the publication.

      e) Figure 2. There is a typo in D: It says "Precessor Image" instead of "Predecessor Image".

      Fixed typo in figure.

      f) You write "Trials for the localizer task were created from -0.1 to 0.5 seconds relative to visual stimulus onset to train the decoders and for the retrieval task, from 0 to 1.5 seconds after onset of the second visual cue image." But the Figure legend 3D starts at -0.1 seconds for the retrieval test.

      We have now clarified this. For the classifier cross-validation and transfer sanity check and clustered analysis we used trials from -0.1 to 0.5s, whereas for the sequenceness analysis of the retrieval, we used trials from 0 to 1.5 seconds

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study advances our understanding of how past and future information is jointly considered in visual working memory by studying gaze biases in a memory task that dissociates the locations during encoding and memory tests. The evidence supporting the conclusions is convincing, with state-of-the-art gaze analyses that build on a recent series of experiments introduced by the authors. This work, with further improvements incorporating the existing literature, will be of broad interest to vision scientists interested in the interplay of vision, eye movements, and memory.

      We thank the Editors and the Reviewers for their enthusiasm and appreciation of our task, our findings, and our article. We also wish to thank the Reviewers for their constructive comments that we have embraced to improve our article. Please find below our point-by-point responses to this valuable feedback, where we also state relevant revisions that we have made to our article.

      In addition, please note that we have now also made our data and code publicly available.

      Reviewer 1, Comments:

      In this study, the authors offer a fresh perspective on how visual working memory operates. They delve into the link between anticipating future events and retaining previous visual information in memory. To achieve this, the authors build upon their recent series of experiments that investigated the interplay between gaze biases and visual working memory. In this study, they introduce an innovative twist to their fundamental task. Specifically, they disentangle the location where information is initially stored from the location where it will be tested in the future. Participants are tasked with learning a novel rule that dictates how the initial storage location relates to the eventual test location. The authors leverage participants' gaze patterns as an indicator of memory selection. Intriguingly, they observe that microsaccades are directed toward both the past encoding location and the anticipated future test location. This observation is noteworthy for several reasons. Firstly, participants' gaze is biased towards the past encoding location, even though that location lacks relevance to the memory test. Secondly, there's a simultaneous occurrence of an increased gaze bias towards both the past and future locations. To explore this temporal aspect further, the authors conduct a compelling analysis that reveals the joint consideration of past and future locations during memory maintenance. Notably, microsaccades biased towards the future test location also exhibit a bias towards the past encoding location. In summary, the authors present an innovative perspective on the adaptable nature of visual working memory. They illustrate how information relevant to the future is integrated with past information to guide behavior.

      Thank you for your enthusiasm for our article and findings as well as for your constructive suggestions for additional analyses that we respond to in detail below.

      This short manuscript presents one experiment with straightforward analyses, clear visualizations, and a convincing interpretation. For their analysis, the authors focus on a single time window in the experimental trial (i.e., 0-1000 ms after retro cue onset). While this time window is most straightforward for the purpose of their study, other time windows are similarly interesting for characterizing the joint consideration of past and future information in memory. First, assessing the gaze biases in the delay period following the cue offset would allow the authors to determine whether the gaze bias towards the future location is sustained throughout the entire interval before the memory test onset. Presumably, the gaze bias towards the past location may not resurface during this delay period, but it is unclear how the bias towards the future location develops in that time window. Also, the disappearance of the retro cue constitutes a visual transient that may leave traces on the gaze biases which speaks again for assessing gaze biases also in the delay period following the cue offset.

      Thank you for raising this important point. We initially focused on the time window during the cue given that our central focus was on gaze-biases associated with mnemonic item selection. By zooming in on this window, we could best visualize our main effects of interest: the joint selection (in time) of past and future memory attributes.

      At the same time, we fully agree that examining the gaze biases over a more extended time window yields a more comprehensive view of our data. To this end, we have now also extended our analysis to include a wider time range that includes the period between cue offset (1000 ms after cue onset) and test onset (1500 ms after cue onset). We present these data below. Because we believe our future readers are likely to be interested in this as well, we have now added this complementary visualization as Supplementary Figure 4 (while preserving the focus in our main figure on the critical mnemonic selection period of interest).

      Author response image 1.

      Supplementary Figure 4. Gaze biases in extended time window as a complement to Figure 1 and Supplementary Figure 2. This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset, the gaze bias towards the future location persists (panel a) and that while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus (panel b).

      This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset (consistent with our prior reports of this bias), the gaze bias towards the future location persists. Moreover, as revealed by the data in panel b above, while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus.

      We now also call out these additional findings and figure in our article:

      Page 2 (Results): “Gaze biases in both axes were driven predominantly by microsaccades (Supplementary Fig. 2) and occurred similarly in horizontal-to-vertical and vertical-tohorizontal trials (Supplementary Fig. 3). Moreover, while the past bias was relatively transient, the future bias continued to increase in anticipation of the of the test stimulus and increasingly incorporated eye-movements beyond the microsaccade range (see Supplementary Fig. 4 for a more extended time range)”.

      Moreover, assessing the gaze bias before retro-cue onset allows the authors to further characterize the observed gaze biases in their study. More specifically, the authors could determine whether the future location is considered already during memory encoding and the subsequent delay period (i.e., before the onset of the retro cue). In a trial, participants encode two oriented gratings presented at opposite locations. The future rule indicates the test locations relative to the encoding locations. In their example (Figure 1a), the test locations are shifted clockwise relative to the encoding location. Thus, there are two pairs of relevant locations (each pair consists of one stimulus location and one potential test location) facing each other at opposite locations and therefore forming an axis (in the illustration the axis would go from bottom left to top right). As the future rule is already known to the participants before trial onset it is possible that participants use that information already during encoding. This could be tested by assessing whether more microsaccades are directed along the relevant axis as compared to the orthogonal axis. The authors should assess whether such a gaze bias exists already before retro cue onset and discuss the theoretical consequences for their main conclusions (e.g., is the future location only jointly used if the test location is implicitly revealed by the retro cue).

      Thank you – this is another interesting point. We fully agree that additional analysis looking at the period prior to retrocue onset may also prove informative. In accordance with the suggested analysis, we have therefore now also analysed the distribution of saccade directions (including in the period from encoding to retrocue) as a function of the future rule (presented below, and now also included as Supplementary Fig. 5). Complementary recent work from our lab has shown how microsaccade directions can align to the axis of memory contents during retention (see de Vries & van Ede, eNeuro, 2024). Based on this finding, one may predict that if participants retain the items in a remapped fashion, their microsaccades may align with the axis of the future rule, and this could potentially already happen prior to cue onset.

      These complementary analyses show that saccade directions are predominantly influenced by the encoding locations rather than the test locations, as seen most clearly by the saccade distribution plots in the middle row of the figure below. To obtain time-courses, we categorized saccades as occurring along the axis of the future rule or along the orthogonal axis (bottom row of the figure below). Like the distribution plots, these time course plots also did not reveal any sign of a bias along the axis of the future rule itself.

      Importantly, note how this does not argue against our main findings of joint selection of past and future memory attributes, as for that central analysis we focused on saccade biases that were specific to the selected memory item, whereas the analyses we present below focus on biases in the axes in which both memory items are defined; not only the cued/selected memory item.

      Author response image 2.

      Supplementary Figure 5. Distribution of saccade directions relative to the future rule from encoding onset. (Top panel) The spatial layouts in the four future rules. (Middle panel) Polar distributions of saccades during 0 to 1500 ms after encoding onset (i.e., the period between encoding onset and cue onset). The purple quadrants represent the axis of the future rule and the grey quadrants the orthogonal axis. (Bottom panel) Time courses of saccades along the above two axes. We did not observe any sign of a bias along the axis of the future rule itself.

      We agree that these additional results are important to bring forward when we interpret our findings. Accordingly, we now mention these findings at the relevant section in our Discussion:

      Page 5 (Discussion): “First, memory contents could have directly been remapped (cf. 4,24–26) to their future-relevant location. However, in this case, one may have expected to exclusively find a future-directed gaze bias, unlike what we observed. Moreover, using a complementary analysis of saccade directions along the axis of the future rule (cf. 24), we found no direct evidence for remapping in the period between encoding and cue (Supplementary Fig. 5)”.

      Reviewer 2, Comments:

      The manuscript by Liu et al. reports a task that is designed to examine the extent to which "past" and "future" information is encoded in working memory that combines a retro cue with rules that indicate the location of an upcoming test probe. An analysis of microsaccades on a fine temporal scale shows the extent to which shifts of attention track the location of the location of the encoded item (past) and the location of the future item (test probe). The location of the encoded grating of the test probe was always on orthogonal axes (horizontal, vertical) so that biases in microsaccades could be used to track shifts of attention to one or the other axis (or mixtures of the two). The overall goal here was then to (1) create a methodology that could tease apart memory for the past and future, respectively, (2) to look at the time-course attention to past/future, and (3) to test the extent to which microsaccades might jointly encode past and future memoranda. Finally, some remarks are made about the plausibility of various accounts of working memory encoding/maintenance based on the examination of these time courses.

      Strengths:

      This research has several notable strengths. It has a clear statement of its aims, is lucidly presented, and uses a clever experimental design that neatly orthogonalizes "past" and "future" as operationalized by the authors. Figure 1b-d shows fairly clearly that saccade directions have an early peak (around 300ms) for the past and a "ramping" up of saccades moving in the forward direction. This seems to be a nice demonstration the method can measure shifts of attention at a fine temporal resolution and differentiate past from future-oriented saccades due to the orthogonal cue approach. The second analysis shown in Figure 2, reveals a dependency in saccade direction such that saccades toward the probe future were more likely also to be toward the encoded location than away from the encoded direction. This suggests saccades are jointly biased by both locations "in memory".

      Thank you for your overall appreciation of our work and for highlighting the above strengths. We also thank you for your constructive comments and call for clarifications that we respond to below.

      Weaknesses:

      (1) The "central contribution" (as the authors characterize it) is that "the brain simultaneously retains the copy of both past and future-relevant locations in working memory, and (re)activates each during mnemonic selection", and that: "... while it is not surprising that the future location is considered, it is far less trivial that both past and future attributes would be retained and (re)activated together. This is our central contribution." However, to succeed at the task, participants must retain the content (grating orientation, past) and probe location (future) in working memory during the delay period. It is true that the location of the grating is functionally irrelevant once the cue is shown, but if we assume that features of a visual object are bound in memory, it is not surprising that location information of the encoded object would bias processing as indicated by microsaccades. Here the authors claim that joint representation of past and future is "far less trivial", this needs to be evaluaed from the standpoint of prior empirical data on memory decay in such circumstances, or some reference to the time-course of the "unbinding" of features in an encoded object.

      Thank you. We agree that our participants have to use the future rule – as otherwise they do not know to which test stimulus they should respond. This was a deliberate decision when designing the task. Critically, however, this does not require (nor imply) that participants have to incorporate and apply the rule to both memory items already prior to the selection cue. It is at least as conceivable that participants would initially retain the two items at their encoded (past) locations, then wait for the cue to select the target memory item, and only then consider the future location associated with the target memory item. After all, in every trial, there is only 1 relevant future location: the one associated with the cued memory item. The time-resolved nature of our gaze markers argues against such a scenario, by virtue of our observation of the joint (simultaneous) consideration of past and future memory attributes (as opposed to selection of past-before-future). These temporal dynamics are central to the insights provided by our study.

      In our view, it is thus not obvious that the rule would be applied at encoding. In this sense, we do not assume that the future location is part of both memory objects from encoding, but rather ask whether this is the case – and, if so, whether the future location takes over the role of the past location, or whether past and future locations are retained jointly.

      Our statements regarding what is “trivial” and what is “less trivial” regard exactly this point: it is trivial that the future is considered (after all, our task demanded it). However, it is less trivial that (1) the future location was already available at the time of initial item selection (as reflected in the simultaneous engagement of past and future locations), and (2) that in presence of the future location, the past location was still also present in the observed gaze biases.

      Having said that, we agree that an interesting possibility is that participants remap both memory items to their future-relevant locations ahead of the cue, but that the past location is not yet fully “unbound” by the time of the cue. This may trigger a gaze bias not only to the new future location but also to the “sticky” (unbound) past location. We now acknowledge this possibility in our discussion (also in response to comment 3 below) where we also suggest how future work may be able to tap into this:

      Page 6 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      (2) The authors refer to "future" and "past" information in working memory and this makes sense at a surface level. However, once the retrocue is revealed, the "rule" is retrieved from long-term memory, and the feature (e.g. right/left, top/bottom) is maintained in memory like any other item representation. Consider the classic test of digit span. The digits are presented and then recalled. Are the digits of the past or future? The authors might say that one cannot know, because past and future are perfectly confounded. An alternative view is that some information in working memory is relevant and some is irrelevant. In the digit span task, all the digits are relevant. Relevant information is relevant precisely because it is thought be necessary in the future. Irrelevant information is irrelevant precisely because it is not thought to be needed in the immediate future. In the current study, the orientation of the grating is relevant, but its location is irrelevant; and the location of the test probe is also relevant.

      Thank you for this stimulating reflection. We agree that in our set-up, past location is technically “task-irrelevant” while future location is certainly “task-relevant”. At the same time, the engagement of the past location suggests to us that the brain uses past location for the selection – presumably because the brain uses spatial location to help individuate/separate the items, even if encoded locations are never asked about. Therefore, whether something is relevant or irrelevant ultimately depends on how one defines relevance (past location may be relevant/useful for the brain even if technically irrelevant from the perspective of the task). In comparison, the use of “past” and “future” may be less ambiguous.

      It is also worth noting how we interpret our findings in relation to demands on visual working memory, inspired by dynamic situations whereby visual stimuli may be last seen at one location but expected to re-appear at another, such as a bird disappearing behind a building (the example in our introduction). Thus, past for us does not refer to the memory item perse (like in the digit span analogue) but, rather, quite specifically to the past location of a dynamic visual stimulus in memory (which, in our experiment, was operationalised by the future rule, for convenience).

      (3) It is not clear how the authors interpret the "joint representation" of past and future. Put aside "future" and "past" for a moment. If there are two elements in memory, both of which are associated with spatial bindings, the attentional focus might be a spatial average of the associated spatial indices. One might also view this as an interference effect, such that the location of the encoded location attracts spatial attention since it has not been fully deleted/removed from working memory. Again, for the impact of the encoded location to be exactly zero after the retrieval cue, requires zero interference or instantaneous decay of the bound location information. It would be helpful for the authors to expand their discussion to further explain how the results fit within a broader theoretical framework and how it fits with empirical data on how quickly an irrelevant feature of an object can be deleted from working memory.

      Thank you also for this point (that is related to the two points above). As we stated in our reply to comment 1 above, we agree that one possibility is that the past location is merely “sticky” and pulls the task-relevant future bias toward the past location. If so, our time courses suggest that such “pulling” occurs only until approximately 600 ms after cue onset, as the past bias is only transient. An alternative interpretation is that the past location may not be merely a residual irrelevant trace, but actually be useful and used by the brain.

      For example, the encoded (past) item locations provide a coordinate system in which to individuate/separate the two memory items. While the future locations also provide such a coordinate system, the brain may benefit from holding onto both coordinate systems at the same time, rendering our observation of joint selection in both frames. Indeed, in a recent VR experiment in which we had participants (rather than the items) rotate, we also found evidence for the joint use of two spatial frames, even if neither was technically required for the upcoming task (see Draschkow, Nobre, van Ede, Nature Human Behaviour, 2022). Though highly speculative at this stage, such reliance on multiple spatial frames may make our memories more robust to decay and/or interference. Moreover, while past location was never explicitly probed in our task, in daily life the past location may sometimes (unexpectedly) become relevant, hence it may be useful to hold onto it, just in case. Thus, considering the past location merely as an “irrelevant feature” (that takes time to delete) may not do sufficient justice to the potential roles of retaining past locations of dynamic visual objects held in working memory.

      As also stated in response to comment 1 above, we now added these relevant considerations to our Discussion:

      Page 5 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      Reviewer 3, Comments:

      This study utilizes saccade metrics to explore, what the authors term the "past and future" of working memory. The study features an original design: in each trial, two pairs of stimuli are presented, first a vertical pair and then a horizontal one. Between these two pairs comes the cue that points the participant to one target of the first pair and another of the second pair. The task is to compare the two cued targets. The design is novel and original but it can be split into two known tasks - the first is a classic working memory task (a post-cue informs participants which of two memorized items is the target), which the authors have used before; and the second is a classic spatial attention task (a pre-cue signal that attention should be oriented left or right), which was used by numerous other studies in the past. The combination of these two tasks in one design is novel and important, as it enables the examination of the dynamics and overlapping processes of these tasks, and this has a lot of merit. However, each task separately is not new. There are quite a few studies on working memory and microsaccades and many on spatial attention and microsaccades. I am concerned that the interpretation of "past vs. future" could mislead readers to think that this is a new field of research, when in fact it is the (nice) extension of an existing one. Since there are so many studies that examined pre-cues and post-cues relative to microsaccades, I expected the interpretation here to rely more heavily on the existing knowledge base in this field. I believe this would have provided a better context of these findings, which are not only on "past" vs. "future" but also on "working memory" vs. "spatial attention".

      Thank you for considering our findings novel and important, while at the same time reminding us of the parallels to prior tasks studying spatial attention in perception and working memory. We fully agree that our task likely engages both attention to the (past) memory item as well as spatial attention to the upcoming (future) test stimulus. At the same time, there is a critical difference in spatial attention for the future in our task compared with ample prior tasks engaging spatial cueing of attention for perception. In our task, the cue never directly cues the future location. Rather, it exclusively cues the relevant memory item. It is the memory item that is associated with the relevant future location, according to the future rule. This integration of the rule-based future location into the memory representation is distinct from classical spatial-attention tasks in which attention is cued directly to a specific location via, for example, a spatial cue such as an arrow.

      Thus, if we wish to think about our task as engaging cueing of spatial attention for perception, we have to at least also invoke the process of cueing the relevant location via the appropriate memory item. We feel it is more parsimonious to think of this as attending to both the past and future location of a dynamic visual object in working memory.

      If we return to our opening example, when we see a bird disappear behind a building, we can keep in working memory where we last saw it, while anticipating where it will re-appear to guide our external spatial attention. Here too, spatial attention is fully dependent on working-memory content (the bird itself) – mirroring the dynamic semng in our study. Thus, we believe our findings contribute a fresh perspective, while of course also extending established fields. We now contextualize our finding within the literature and clarify our unique contribution in our revised manuscript:

      Page 5 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

      Reviewer 2, Recommendations:

      It would be helpful to set up predictions based on existing working memory models. Otherwise, the claim that the joint coding of past/future is "not trivial" is simply asserted, rather than contradicting an existing model or prior empirical results. If the non-trivial aspect is simply the ability to demonstrate the joint coding empirical through a good experimental design, make it clear that this is the contribution. For example, it may be that prevailing models predict exactly this finding, but nobody has been able to demonstrate it cleanly, as the authors do here. So the non-triviality is not that the result contradicts working memory models, but rather relates to the methodological difficulty of revealing such an effect.

      Thank you for your recommendation. First, please see our point-by-point responses to the individual comments above, where we also state relevant changes that we have made to our article, and where we clarify what we meant with “non trivial”. As we currently also state in our introduction, our work took as a starting point the framework that working memory is inherently about the past while being for the future (cf. van Ede & Nobre, Annual Review of Psychology, 2023). By virtue of our unique task design, we were able to empirically demonstrate that visual contents in working memory are selected via both their past and their future-relevant locations – with past and future memory attributes being engaged together in time. With “not trivial” we merely intend to make clear that there are viable alternatives than the findings we observed. For example, past could have been replaced by the future, or it could have been that item selection (through its past location) was required before its future-relevant location could be considered (i.e. past-before-future, rather than joint selection as we reported). We outline these alternatives in the second paragraph of our Discussion:

      Page 5 (Discussion): “Our finding of joint utilisation of past and future memory attributes emerged from at least two alternative scenarios of how the brain may deal with dynamic everyday working memory demands in which memory content is encoded at one location but needed at another.

      First, [….]”

      Our work was not motivated from a particular theoretical debate and did not aim to challenge ongoing debates in the working-memory literature, such as: slot vs. resource, active vs. silent coding, decay vs. interference, and so on. To our knowledge, none of these debates makes specific claims about the retention and selection of past and future visual memory attributes – despite this being an important question for understanding working memory in dynamics everyday semngs, as we hoped to make clear by our opening example.

      Reviewer 3, Recommendations:

      I recommend that the present findings be more clearly interpreted in the context of previous findings on working memory and attention. The task design includes two components - the first (post-cue) is a classic working memory task and the second (the pre-cue) is a classic spatial attention design. Both components were thoroughly studied in the past and this previous knowledge should be better integrated into the present conclusions. I specifically feel uncomfortable with the interpretation of past vs. future. I find this framework to be misleading because it reads like this paper is on a topic that is completely new and never studied before, when in fact this is a study on the interaction between working memory and spatial attention. I recommend the authors minimize this past-future framing or be more explicit in explaining how this new framework relates to the more common terminology in the field and make sure that the findings are not presented in a vacuum, as another contribution to the vibrant field that they are part of.

      Thank you for these recommendations. Please also see our point-by-point responses to the individual comments above. Here, we explained our logic behind using the terminology of past vs. future (in addition, see also our response to point 2 or reviewer 2). Here, we also stated relevant changes that we have made to our manuscript to explain how our findings complement – but are also distinct from – prior tasks that used pre-cues to direct spatial attention to an upcoming stimulus. As we explained above, in our task, the cue itself never contained information about the upcoming test location. Rather, the upcoming test location was a property of the memory item (given the future rule). Hence, we referred to this as a “future attribute” of the cued memory item, rather than as the “cued location” for external spatial attention. Still, we agree the future bias likely (also) reflects spatial allocation to the upcoming test array, and we explicitly acknowledge this in our discussion. For example:

      Page 5 (Discussion): “This signal may reflect either of two situations: the selection of a future-copy of the cued memory content or anticipatory attention to its the anticipated location of its associated test-stimulus. Either way, by the nature of our experimental design, this future signal should be considered a content-specific memory attribute for two reasons. First, the two memory contents were always associated with opposite testing locations, hence the observed bias to the relevant future location must be attributed specifically to the cued memory content. Second, we cued which memory item would become tested based on its colour, but the to-be-tested location was dependent on the item’s encoding location, regardless of its colour. Hence, consideration of the item’s future-relevant location must have been mediated by selecting the memory item itself, as it could not have proceeded via cue colour directly.”

      Page 6 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

    1. Author response:

      Factual error in the eLife assessment to be corrected:

      In the eLife assessment, "ribosomal protein H59" should be changed to "helix 59 of the 28S ribosomal RNA" to make this factually correct.

      Provisional author response

      We thank the reviewers for their thorough and thoughtful readings of the manuscript. Our responses to the four suggestions made in their public reviews are below.

      Reviewer #1 (Public Review):

      Major points:

      (1) The identification of RAMP4 is a pivotal discovery in this paper. The sophisticated AlphaFold prediction, de novo model building of RAMP4's RBD domain, and sequence analyses provide strong evidence supporting the inclusion of RAMP4 in the ribosome-translocon complex structure.

      However, it is crucial to ensure the presence of RAMP4 in the purified sample. Particularly, a validation step such as western blotting for RAMP4 in the purified samples would strengthen the assertion that the ribosome-translocon complex indeed contains RAMP4. This is especially important given the purification steps involving stringent membrane solubilization and affinity column pull-down.

      As suggested, we will revise the manuscript to include Western blots showing that RAMP4 is retained at secretory translocons (and not multipass translocons) after solubilisation, affinity purification, and recovery of ribosome-translocon complexes.

      (2) Despite the comprehensive analyses conducted by the authors, it is challenging to accept the assertion that the extra density observed in TRAP class 1 corresponds to calnexin. The additional density in TRAP class 1 appears to be less well-resolved, and the evidence for assigning it as calnexin is insufficient. The extra density there can be any proteins that bind to TRAP. It is recommended that the authors examine the density on the ER lumen side. An investigation into whether calnexin's N-globular domain and P-domain are present in the ER lumen in TRAP class 1 would provide a clearer understanding.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. We have exhaustively searched our maps for any unexplained density connected with the putative Calnexin TMD, and have found none. This is consistent with Calnexin's lumenal domain being flexibly linked to its TMD, and thus would not be resolved in a ribosome-aligned reconstruction.

      Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we will ensure that the text and figures consistently describe this assignment as provisional or putative.

      (3) In the section titled 'TRAP competes and cooperates with different translocon subunits,' the authors present a compelling explanation for why TRAP delta defects can lead to congenital disorders of glycosylation. To enhance this explanation, it would be valuable if the authors could provide additional analyses based on mutations mentioned in the references. Specifically, examining whether these mutations align with the TRAP delta-OSTA structure models would strengthen the link between TRAP delta defects and the observed congenital disorders of glycosylation.

      We agree that mapping disease-causing point mutants to the TRAP delta structure could be potentially informative. Unfortunately, the referenced TRAP delta disease mutants act by simply impairing TRAP delta expression, and thus admit no such fine-grained analyses. However, sequence conservation is our next best guide to mutant function. We note in the text that the contact site charges on TRAP delta and RPN2 are conserved, and that the closest-juxtaposed interaction pair (K117 on TRAPδ and D386 on RPN2) is also the most conserved.

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript contains numerous novel new structural analyses and their potential functional implications. While all findings are exciting, the highlight is the discovery of RAMP4/SERP1 near the Sec61 lateral gate. Overall, the strength is the thorough and extensive structural analysis of the different high-resolution RTC classes as well as the expert bioinformatic evolutionary analysis.

      Weaknesses:

      A minor downside of the manuscript is the sheer volume of analyses and mechanistic hypotheses, which makes it sometimes difficult to follow. The authors might consider offloading some analyses based on weaker evidence to the supplement to maximize impact.

      We agree that the manuscript is long, and we will seek ways to streamline it in revision while avoiding the undesirable side effect of making important findings undiscoverable via literature searches (an unfortunate consequence of many supplemental data). Indeed, we chose eLife for its flexibility regarding article length and suitability for extended and detailed analyses.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      We are grateful for the overall positive feedback from the reviewer.

      We agree with the reviewer that our data showing cellular co-localization between PRC1 and BIN1 requires further investigation in future studies, however, we are confident that in the current form, our manuscript already presents multiple evidences for the role of BIN1 in mitotic processes. We would like to emphasize that PRC1 is not the sole BIN1 partner that connects it to mitotic processes, but it is only one out of more than a dozen that we identified in our study. Furthermore, the mitotic connection with BIN1 is not absolutely novel as BIN1 levels are mildly fluctuating during the cell cycle, similar to other proteins involved in the regulation of the cell cycle (Santos et al., 2015) and because DNM2 is also a well-accepted actor during mitosis (Thompson et al., 2002).

      The less marked co-localization between BIN1 and PRC1 compared to the strong co-localization between BIN1 and DNM2 can be a consequence of their weaker affinity and their partial binding. Yet, this does not necessarily imply that stronger interactions have more biological significance. For example, weaker affinities can be compensated by local concentrations to achieve an even higher degree of cellular complexes than of strongly binding interactions that are separated within the cell. Furthermore, even the degree of complex formation cannot be used intuitively to estimate the biological significance of a complex because complexes can trigger very important biological processes even at very low abundances, e.g. by catalyzing enzymatic reactions. Deciding what is and what is not “biologically significant” among the identified interactions remains to be answered in the future, once we are able to overview complex biological processes in a holistic manner.

      In the revised version, we implemented minor changes to further clarify the raised points.

      Reviewer #2:

      We thank the reviewer for the careful assessment and we are pleased to see the positive enthusiasm regarding our affinity interactomic strategy.

      The reviewer points out that affinities were only measured with a single technique, which is relatively unproven. While it is true that our work uses two techniques building on the same holdup concept, we rather believe that this approach is well-proven. The original holdup method was described almost 20 years ago and since then, it has been used in more than 10 publications for quantitative interactomics. Over the years, at least five distinct generations of the assay were developed, all building on the expertise of the preceding one. In the past, we extensively proved that the resulting affinities show excellent agreement with affinities measured with other methods, such as fluorescence polarization, isothermal titration calorimetry, or surface plasmon resonance (for example in Vincentelli et al. Nat. Meth. 2015; Gogl et al. 2020 Structure; Gogl et al. 2022 Nat.Com.). However, it is true that the most recent variation of this method family, called native holdup, is a fairly new approach published just a bit more than a year ago and this is only the third work that utilizes this method. Yet, in our original work describing the method, we demonstrated good agreement with the results of previous holdup experiments, as well as with orthogonal affinity measurements (Zambo et al. 2022).

      Importantly, the reviewer raises concerns regarding the number of replicates used in our study, as well as the reliability of our methodology. We are glad for such a comment as it allows us to explain our motives behind experimental design which is most often left out from scientific works to save space and keep focus on results. The reason why we use technical replicates instead of the typical biological replicates lies in the nature of the holdup assay. In a typical interactomic assay, such as immunoprecipitation, a lot of variables can perturb the outcome of the measurement, such as bait immobilization, or captured prey leakage during washing steps. The output of such an experiment is a list of statistically significant partners and to minimize these variabilities, biological replicates are used. In the case of a native holdup approach, a panel of an equal amount of resins, all saturated with different baits or controls, is mixed with an equal amount of cell extract, taken from a single tube, and after a brief incubation, the supernatant of this mixture is analyzed. The output of such an experiment is a list of relative concentrations of prey and to maximize its accuracy, we use technical replicates. Using an ideal analytical method, such as fluorescence, it is not necessary to use technical replicates to reach accurate results. For example, the general accuracy of a holdup experiment coupled with a robust analytical approach can be seen clearly in our fragmentomic holdup data shown in Figure 7C where mutant domains that do not have any impact on the interactome show extreme agreement in affinities. Unfortunately, mass spectrometry is less accurate as an analytical method, hence we use technical triplicates to compensate for this. Finally, in the case of BIN1, an independent nHU measurement was also performed using a less capable mass spectrometer. Not counting the 117 detected partners of BIN1 that were only detected in only one of these proteomic measurements, 29 partners were identified as common significant partners in both of these measurements showing nearly identical affinities with a mean standard deviation between measured pKapp values of 0.18, meaning that the obtained dissociation constants are within a <2.5-fold range with >95% probability. There were also 61 BIN1 partners that were detected in both proteomic measurements but were only identified as a significant interaction partner in one of these experiments. Yet many of them show binding in both assays, albeit were found to be not significant in one of these assays. For example, CDC20 shows 66% depletion in one assay (significant binding) while it shows 54% depletion in the other (not significant binding), or CKAP2 shows 58% depletion in one assay (significant binding) while it shows 41% depletion in the other (not significant binding). We hope that these examples show that statistical significance in nHU experiments rather signifies how certain we are in a particular affinity measurement and not the accuracy of the affinity measurement itself. While there are true discrepancies between some of the affinity measurements between these experiments, that would be possible to clarify with more experimental replicates, the raw data presented in our work clearly demonstrate the strength and robustness of a fully quantitative interactomic assay.

      In the revised version, we clarified the number of replicates in the text, in the figure legends, and included some of this discussion in the method section.

      The reviewer had some very useful comments regarding affinity differences between short fragments and full-length proteins. In his comment, he possibly made a typo as we find that fulllength proteins typically interact with higher affinities compared to short PxxP motif fragments in isolation and not weaker. The reviewer also comments that we explain this difference with cooperativity. In a previous preprint version, which the reviewer may have seen, this was indeed the case, but since we realized that we did not have sufficient evidence supporting this model, therefore we did not discuss this in detail in the last version submitted to eLife. To clarify this, we included more discussion about the observed differences in the affinities between fragments and full-length proteins, but since we have limited data to make solid conclusions, we do not go into details about underlying models.

      Instead of cooperativity, the reviewer suggests that the observed differences may originate from additional residues that were not included in our peptides. Indeed, many similar experiments fail because of suboptimal peptide library design. Our peptide library was constructed as 15-mer, xxxxxxPxxPxxxxx motifs and we do not see a strong contribution of residues at the far end of these peptides. Specificity logo reconstructions are expected to identify all key residues that participate in SH3 domain binding, and based on this, all key residues of the identified motifs can be included in shorter 10-mer, xxxPxxPxxx motifs. Therefore, it is unlikely that residues outside our peptide regions will greatly contribute to the site-specific interactions of SH3 domains. It is however possible that other sites, that are sequentially far away from the studied PxxP motifs, are also capable of binding to SH3 through a different surface, but in light of the small size of an isolated SH3 domain, we believe it is very unlikely. It is also possible that BIN1 could also interact with other types of SH3 binding motifs that were not included in our peptide library. We think a more likely explanation is some sort of cooperativity. Cooperativity, or rather synergism between different sites can be easily explained in typical situations, such as in the case of a bimolecular interaction that is mediated by two independent sites. In such an event, once one site is bound, the second binding event will likely also occur because of the high effective local concentration of the binding sites. However, cooperativity can also form in atypical conditions and a molecular explanation for these events is rather elusive. As BIN1 contains a single SH3 domain, its binding to targets containing more binding sites can be challenging to interpret. If these sites are part of a greater Pro-rich region, such as in the case of DNM2, it is possible that the entire region adopts a fuzzy, malleable, yet PPII-like helical conformation. Once the SH3 domain is recruited to this helical region, it can freely trans-locate within this region via lateral diffusion and it will pause on optimal PxxP motifs. As an alternative to this sliding mechanism, a diffusion-limited cooperative binding can also occur. If the two motifs are not part of the same Pro-rich region, but are relatively close in space, such as in the case of ITCH or PRC1, once a BIN1 molecule dissociates from one site, it has a higher chance to rebind to the second site due to higher local concentrations. Such an event can more likely occur if a transient, but relatively stable encounter complex exists between the two molecules, from which complex formation can occur at both sites (A+B↔AB; AB↔ABsite1; AB*↔ABsite2). However, this large effective local concentration in this encounter complex is only temporary because diffusion rapidly diminishes it, although weak electrostatic interactions can increase the lifetime of such encounter complexes. In contrast, the large effective local concentration in conventional multivalent binding is time-independent and only determined by the geometry of the complex. Finally, it may also occur that our empirical bait concentration estimation for immobilized biotinylated proteins is less accurate than the concentration estimation of peptide baits because we approximate this value based on peptide baits. For this technical reason, which was discussed in detail in the original paper describing the nHU approach, we are carefully using apparent affinities for nHU experiments. Nevertheless, even without accurate bait concentrations, our nHU experiment provides precise relative affinities and, thus partner ranking. Either of the mechanisms underlying the interactions we study would be difficult to further explore experimentally, especially at the proteomic level.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      The data is poorly dealt with, and the figures are shown poorly. For example, Figure 2A is not even shown totally.

      We apologize for any difficulties that the reviewer encountered while attempting to view the figures. We have confirmed that all figures, including all panels of Figure 2, display correctly on the HTML and PDF versions of the article hosted at bioRxiv. The HTML and PDF versions generated by eLife also appears to contain all figures and panels in their entirety.

      Reviewer #2 (Recommendations For The Authors):

      Please refer to the public review for possible revisions.

      We thank Reviewer #2 for the summary and thoughtful comments provided in the Public Review. We note the point of possible revision noted from the Public Review: “It can be informative to directly demonstrate DPYD promoter-enhancer interactions. However, the genetic variants support the integration of regulatory activities.” In Figure 4, we provide evidence for direct promoterenhancer interaction though the use of 3C. We furthermore demonstrate that these interactions are dependent upon genotype at rs4294451 as stated by the reviewer. We have highlighted the promoter-enhancer interaction in the revised manuscript, lines 323-325. The role of genotype in this interaction is also specifically discussed in lines 378-381.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Gap junction channels establish gated intercellular conduits that allow the diffusion of solutes between two cells. Hexameric connexin26 (Cx26) hemichannels are closed under basal conditions and open in response to CO2. In contrast, when forming a dodecameric gapjunction, channels are open under basal conditions and close with increased CO2 levels. Previous experiments have implicated Cx26 residue K125 in the gating mechanism by CO2, which is thought to become carbamylated by CO2. Carbamylation is a labile post-translational modification that confers negative charge to the K125 side chain. How the introduction of a negative charge at K125 causes a change in gating is unclear, but it has been proposed that carbamylated K125 forms a salt bridge with the side chain at R104, causing a conformational change in the channel. It is also unclear how overall gating is controlled by changes in CO2, since there is significant variability between structures of gap-junction channels and the cytoplasmic domain is generally poorly resolved. Structures of WT Cx26 gap-junction channels determined in the presence of various concentrations of CO2 have suggested that the cytoplasmatic N-terminus changes conformation depending on the concentration of the gas, occluding the pore when CO2 levels are high.

      In the present manuscript, Deborah H. Brotherton and collaborators use an intercellular dyetransfer assay to show that Cx26 gap-junction channels containing the K125E mutation, which mimics carbamylation caused by CO2, is constitutively closed even at CO2 concentrations where WT channels are open. Several cryo-EM structures of WT and mutant Cx26 gap junction channels were determined at various conditions and using classification procedures that extracted more than one structural class from some of the datasets. Together, the features on each of the different structures are generally consistent with previously obtained structures at different CO2 concentrations and support the mechanism that is proposed in the manuscript. The most populated class for K125E channels determined at high CO2 shows a pore that is constricted by the N-terminus, and a cytoplasmic region that was better resolved than in WT channels, suggesting increased stability. The K125E structure closely resembles one of the two major classes obtained for WT channels at high CO2. These findings support the hypothesis that the K125E mutation biases channels towards the closed state, while WT channels are in an equilibrium between open and closed states even in the presence of high CO2. Consistently, a structure of K125E obtained in the absence of CO2 appeared to also represent a closed state but at lower resolution, suggesting that CO2 has other effects on the channel beyond carbamylation of K125 that also contribute to stabilizing the closed state. Structures determined for K125R channels, which are constitutively open because arginine cannot be carbamylated, and would be predicted to represent open states, yielded apparently inconclusive results.

      A non-protein density was found to be trapped inside the pore in all structures obtained using both DDM and LMNG detergents, suggesting that the density represents a lipid rather than a detergent molecule. It is thought that the lipid could contribute to the process of gating, but this remains speculative. The cytoplasmic region in the tentatively closed structural class of the WT channel obtained using LMNG was better resolved. An additional portion of the cytoplasmic face could be resolved by focusing classification on a single subunit, which had a conformation that resembled the AlphaFold prediction. However, this single-subunit conformation was incompatible with a C6-symmetric arrangement. Together, the results suggest that the identified states of the channel represent open states and closed states resulting from interaction with CO2. Therefore, the observed conformational changes illuminate a possible structural mechanism for channel gating in response to CO2.

      Some of the discussion involving comparisons with structures of other gap junction channels are relatively hard to follow as currently written, especially for a general readership. Also, no additional functional experiments are carried out to test any of the hypotheses arising from the data. However, structures were determined in multiple conditions, with results that were consistent with the main hypothesis of the manuscript. No discussion is provided, even if speculative, to explain the difference in behavior between hemichannels and gap junction channels. Also, no attempt was made to measure the dimensions of the pore, which is relevant because of the importance of identifying if the structures indeed represent open or closed states of the channel.

      We have considerably revised the manuscript in an attempt to make it more tractable. We respond to the individual comments below.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Brotherton et al. describes a structural study of connexin-26 (Cx26) gap junction channel mutant K125E, which is designed to mimic the CO2-inhibited form of the channel. In the wild-type Cx26, exposure to CO2 is presumed to close the channel through carbamylation of the residue K125. The authors mutated K125 to a negatively charged residue to mimic this effect, and they observed by cryo-EM analysis of the mutated channel that the pore of the channel is constricted. The authors were able to observe conformations of the channel with resolved density for the cytoplasmic loop (in which K125 is located). Based on the observed conformations and on the position of the N-terminal helix, which is involved in channel gating and in controlling the size of the pore, the authors propose the mechanisms of Cx26 regulation.

      Strengths:

      This is a very interesting and timely study, and the observations provide a lot of new information on connexin channel regulation. The authors use the state of the art cryo-EM analysis and 3D classification approaches to tease out the conformations of the channel that can be interpreted as "inhibited", with important implications for our understanding of how the conformations of the connexin channels controlled.

      Weaknesses:

      My fundamental question to the premise of this study is: to what extent can K125 carbamylation by recapitulated by a simple K125E mutation? Lysine has a large side chain, and its carbamylation would make it even slightly larger. While the authors make a compelling case for E125-induced conformational changes focusing primarily on the negative charge, I wonder whether they considered the extent to which their observation with this mutant may translate to the carbamoylated lysine in the wild-type Cx26, considering not only the charge but also the size of the modified side-chain.

      This is an important point. We agree that the difference in size will have a different effect on the structure. For kinases, aspartate or glutamate are often used as mimics of phosphorylated serine or threonine and these will have the same issues. The fact that we cannot resolve the relevant side-chains in the density may be indicative that the mutation doesn’t give the whole story. It may be able to shift the equilibrium towards the closed conformation, but not stably trap the molecule in that conformation. We include a comment to this effect in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The mechanism underlying the well-documented CO2-regulated activity of connexin 26 (Cx26) remains poorly understood. This is largely due to the labile nature of CO2-mediated carbamylation, making it challenging to visualize the effects of this reversible posttranslational modification. This paper by Brotherton et al. aims to address this gap by providing structural insights through cryo-EM structures of a carbamylation-mimetic mutant of the gap junction protein.

      Strengths:

      The combination of the mutation, elevated PCO2, and the use of LMNG detergent resulted in high-resolution maps that revealed, for the first time, the structure of the cytoplasmic loop between transmembrane helix (TM) 2 and 3.

      Weaknesses:

      The presented maps merely reinforce their previous findings, wherein wildtype Cx26 favored a closed conformation in the presence of high PCO2. While the structure of the TM2-TM3 loop may suggest a mechanism for stabilizing the closed conformation, no experimental data was provided to support this mechanism. Additionally, the cryo-EM maps were not effectively presented, making it difficult for readers to grasp the message.

      We have extensively revised the manuscript so that the novelty of this study is more apparent. There are three major points

      (1) The carbamylation mimetic pushes the conformation towards the closed conformation. Previously we just showed that CO2 pushes the conformation towards this conformation. Though we could show this was not due to pH, and could speculate this was due to carbamylation as suggested by previous mutagenesis studies, our data did not provide any mechanism whereby Lys125 was involved.

      (2) In going from the open to closed conformations, not only is a conformational change in TM2 involved, as we saw previously, but also a conformational change in TM1, the linker to the N-terminus and the cytoplasmic loop. Thus there is a clear connection between Lys125 and the conformation of the pore-closing N-terminus.

      (3) We observe for the first time in any connexin structure, density for the cytoplasmic loop. Since this loop is important in regulation, knowing how it might influence the positions of the transmembrane helices is important information if we are to understand how connexins can be regulated.

      Reviewing Editor:

      The reviewers have agreed on a list of suggested revisions that would improve the eLife assessment if implemented, which are as follows:

      (1) For completeness, Figure 1 could be supplied with an example of how the experiment would look like in the presence of CO2 - for the wild-type and for the K125E mutant. presumably for the wild-type this has been done previously in exactly this assay format, but this control would be an important part of characterization for the mutant. Page 4, lines 105106; "unsurprisingly, Cx26K125E gap junctions remain closed at a PCO2 of 55 mmHg." The data should be presented in the manuscript.

      We have now included the data with a PCO2 of 55mmH. This is now Figure 4 in our revised manuscript.

      (2) Would AlphaFold predictions show any interpretable differences in the E125 mutant, compared to the K125 (the wild-type)?

      We tried this in response to the reviewer’s suggestion. We did not see any interpretable differences. In general AlphaFold is not recognised as giving meaningful information around point mutations.

      (3) The K125R mutant appears to be a more effective control for extracting significant features from the K125E maps. Given that the use of a buffer containing high PCO2 is essential for obtaining high-resolution maps, wildtype Cx26 is unsuitable as an appropriate control. The K125R map, obtained at a high resolution (2.1Å), supports its suitability as a robust control.

      Though we are unsure what the referee is referring to here, we have rewritten this section and compare against the K125R map (figure 5a) as well as that derived from the wild-type protein. The important point is that the K125E mutant, causes a structural change that is consistent with the closure of the gap junctions that we observe in the dye-transfer assays.

      (4) Likewise, the rationale for using wildtype Cx26 maps obtained in DDM is unclear. Wildtype Cx26 seems to yield much better cryo-EM maps in LMNG. We suggest focusing the manuscript on the higher-quality maps, and providing supporting information from the DDM maps to discuss consistency between observations and the likely possibility that the nonprotein density in the pore is lipid and not detergent.

      The rationale for comparing the mutants against the wt Cx26 maps obtained in DDM was because the mutants were also solubilised in DDM. However, taking the lead from the referees’ comments, we have now rewritten the manuscript so that we first focus on the data we obtain from protein solubilised in LMNG. We feel this makes our message much clearer.

      (5) In general, the rationale for utilizing cryo-EM maps with the entire selected particles is unclear. Although the overall resolutions may slightly improve in this approach, the regions of interest, such as the N-terminus and the cytoplasmic loop, appear to be better ordered afer further classifications. The paper would be more comprehensible if it focuses solely on the classes representing the pore-constricting N-terminus (PCN) and the pore-open flexible Nterminus (POFN) conformations. Also, the nomenclatures used in the manuscript, such as "WT90-Class1", "K125E90-1", "LMNG90-class1", "LMNG90-mon-pcn" are confusing.

      LMNG90s are also wildtype; K125E-90-1 is in Class1 for this mutant and is similar to WT90Class2, which represents the PCN conformation. More consistent and intuitive nomenclatures would be helpful.

      We agree with the referees’ comments. This should now be clearer with our rewritten manuscript where we have simplified this considerably. We now call the conformations NConst (N-terminus defined and constricting the pore) and NFlex (N-terminus not visible) and keep this consistent throughout.

      (6) A potential salt bridge between the carbamylated K125 and R104 is proposed to account for the prevalence of Class-1 (i.e., PCN) in the majority of cryo-EM particles. However, the side chain densities are not well-defined, suggesting that such an interaction may not be strong enough to trap Cx26 in a closed conformation. Furthermore, the absence of experimental data to support this mechanism makes it unclear how likely this mechanism may be. Combining simple mutagenesis, such as R104E, with a dye transfer assay could offer support for this mechanism. Are there any published experimental results that could help address this question without the need for additional experimental work? Alternatively, as acknowledged in the discussion, this mechanism may be deemed as an "over-simplification." What is an alternative mechanism?

      R104 has been mutated to alanine in gap junctions and tested in a dye transfer assay as now mentioned in the text (Nijar et al, J Physiol 2021) supporting this role. In hemichannels R104 has been mutated to both alanine and glutamate and tested through dye loading assays Meigh et al, eLife 2013). Also in hemichannels R104 and K125 have been mutated to cysteines allowing them to be cross-linked through a disulphide bond. This mutant responds to a change in redox potential in a similar way to which the wild type protein responds to CO2 (Meigh et al, Open Biol 2015). Therefore, there is no doubt that the residues are important for the mechanism and the salt-bridge interaction seems a plausible mechanism to reconcile the mutagenesis data, however we cannot be sure that there are not other interactions involved that are necessary for closure. This information has now been included in the text.

      (7) The cryo-EM maps presented in the manuscript propose that gap junctions are constitutively open under normal PCO2 as the flexible N-terminus clears the solute permeation pathway in the middle of the channel. However, hemichannels appear to be closed under normal PCO2. It is puzzling how gap junctions can open when hemichannels are closed under normal PCO2 conditions. If this question has been addressed in previous studies, the underlying mechanism should be explicitly described in the introduction. If it remains an open question, differences in the opening mechanisms between hemichannels and gap junctions should be investigated.

      We suspect this is due to the difference in flexibility of gap junctions relative to hemichannels. However, a discussion of this is beyond this paper and would be complete speculation based on hemichannel structures of other connexins, performed in different buffering systems. There are no high resolution structures of Cx26 hemichannels.

      (8) A mystery density likely representing a lipid is abruptly introduced, but the significance of this discovery is unclear. It is hard to place the lipid on Figure S6 in the wider context of everything else that is discussed in the text. It would be helpful for readers if a figure were provided to show where the density is located in relation to all the other regions that are extensively discussed in the text.

      In the revised text this section has been completely rewritten. We have now include a more informative view in a new figure (Figure 1 – figure supplement 3).

      (9) Including and displaying even tentative pore-diameter measurements for the different states - this would be helpful for readers and provide a more direct visual cue as to the difference between open and closed states.

      We have purposely avoided giving precise measurements to the pore-diameter, since this depends on how we model the N-terminus. The first three residues are difficult to model into the density without causing stearic clashes with the neighbouring subunits.

      (10) Given that no additional experiments for channel function were carried out, it would be useful if to provide a more detailed discussion of additional mutagenesis results from the literature that are related to the experimental results presented.

      We have amplified this in the discussion (see answer to point 6).

      The reviewers also agreed that improvements in the presentation of the data would strengthen the manuscript. Here is a summary list of suggestions by reviewers aimed at helping improve how the data is presented:

      (1) Why is the pipette bright green in the top image, but rather weakly green in the bottom image in Figure 1 - is this the case for all images?

      (Now figure 4) This depends on whether the pipette was in the focal plane of view or not. The important point of these images is the difference in intensity of the donor vs the recipient cell. The graphs in figure 4c illustrate clearly the difference between the wild-type and the mutant gap junctions.

      (2) In figures 2-5, labels would help a lot in understanding what is shown - while the legends do provide the information on what is presented, it would help the reader to see the models/maps with labels directly in the panel. For example, Figure 2a/b - just indicating "WT90 Cx26" in pink and "K125E90" in blue directly in the panel would reduce the work for the reader.

      We have extensively modified the labels in the figures to address this issue.

      (3) Figure 4 - magenta and pink are fairly close, and to avoid confusion it might be useful to use a different color selection. This is especially true when structures are overlayed, as in this figure - the presentation becomes rather complicated, so the less confusion the color code can introduce, the better.

      (Now Figure 2) We have now changed pink to blue.

      (4) Figure 5 - a remarkably under-labelled figure.

      Now added labels.

      (5) Figure 6 - it would be interesting to add a comparison to Cx32 here as well for completeness, since the structure has been published in the meantime.

      Cx32 has now been included.

      (6) Figure 7 - please add equivalent labels on both sides of the model, left and right. Add the connecting lines for all of the tubes TM helices - this will help trace the structural elements shown. The legend does not quite explain the colors.

      We have modified the figure as suggested and explained the colours in the legend.

      (8) Fig.1 legend; Unclear what mCherry fluorescence represents. State that Cx26 was expressed as a translational fusion with mCherry.

      Now figure 4. We have now written “Montages each showing bright field DIC image of HeLa cells with mCherry fluorescence corresponding to the Cx26K125E-mCherry fusion superimposed (leftmost image) and the permeation of NBDG from the recorded cell to coupled cells.”

      (9) Fig. 3 b); Show R104 in the figure. Also E129-R98/R99 interaction is hard to acknowledge from the figure. It seems that the side chain density of E129 is not strong enough to support the modeled orientation.

      This is now Figure 1c. While the density in this region is sufficient to be confident of the main chain, we agree that the side chain density for the E129-R98/R99 interaction is not sufficiently clear to draw attention to and have removed the associated comment from the figure legend. The density is focussed on the linker between TM1 and the N-terminus and the KVRIEG motif. We prefer to omit R104, in order to keep the focus on this region. As described in the manuscript, the density for the R104 side chain is poor.

      (10) Fig. 3 c); Label the N-terminus and KVRIEG motif in the figure.

      Now Figure 1b. We have labelled the N-terminus. The KVRIEG motif is not visible in this map.

      (11) Page 9, lines 246-248; Restate, "We note, however, density near to Lys125, between Ser19 in the TM1-N-term linker, Tyr212 of TM4 and Tyr97 on TM3 of the neighbouring subunit, which we have been unable to explain with our modelling."

      We have reworded this.

      (12) Page 14, line 399; Patch clamp recording is not included in the manuscript.

      Patch clamp recordings were used to introduce dye into the donor cell.

      (13) On the same Figure 2, clashes are mentioned but these are hard to appreciate in any of the figures shown. Perhaps would be useful to include an inset showing this.

      We have modified Figure 2b slightly and added an explanation to highlight the clash. It is slightly confusing because the residues involved belong to neighbouring subunits.

      (14) The discussion related to Figure 6 is very hard to follow for readers who are not familiar with the context of abbreviations included on the figure labels. This figure could be improved to allow a general readership to identify more clearly each of the features and structural differences that are discussed in the text.

      We have extensively changed the text and updated the labels on the figure to make it much easier for the reader to follow.

      Below, you can find the individual reviews by each of the three reviewers.

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 2d-e, the text discusses differences between K125E 90-1 and WT 90-class2 (7QEW), yet the figure compares K125E with 7QEQ. I suggest including a figure panel with a comparison between the two structures discussed in the manuscript text.

      This has been changed in the revised manuscript.

      Other comments have been addressed above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The reviewers thoughtful comments have helped us make the manuscript both more comprehensive and clearer. Thank you for your time and effort. We know that this is a long and technical paper. In our responses we refer to three documents:

      • Original: the first original submission

      • Revision: the revised document (02 MillardFranklinHerzog2023 v2.pdf)

      • Difference: a document that shows the changes made to text (but not figures or tables) from the original to revision (03 MillardFranklinHerzog2023 diff.pdf).

      Reviewer #1 (Recommendations For The Authors):

      (1) In general, the paper is well written and addresses important questions of muscle mechanics and muscle modeling. In the current version, the model limitations are briefly summarized in the abstract. However, the discussion needs a more complete description of limitations as well as a discussion of types of data (in vivo, ex vivo, single fiber, wholes muscle, MTU, etc.) that can be modeled using this approach.

      Please see the response to comment 23 for more details of the limitations that have been added to the revised document.

      (2) The choice of a model with several tendon parameters for simulating single muscle fiber experiments is not well justified.

      A rigid-tendon model with a slack length of zero was, in fact, used for these simulations for both the VEXAT and Hill models. In case this is still not clear: a rigid-tendon model of zero length is equivalent to no tendon at all. The text that first mentions the tendon model has now been modified to make it clearer that the parameters of the model were set to be consistent with no tendon at all:

      Please see the following text:

      Original:

      • page 17, column 1, line 28 ”... rigid tendon of zero length,”

      • page 17, column 1, line 51 ”... rigid tendon of zero length.”

      Revision:

      • page 19, column 1, line 19 ”... we used a rigid-tendon of zero length (equivalent to ignoring the tendon)”

      • page 19, column 1, line 38 ”... coupled with a rigid-tendon of zero-length.”

      Difference:

      • page 21, column 1, line 19 ”... we used a rigid-tendon... ”

      • page 21, column 1, line 45 ”... rigid-tendon of zero length ...”

      (3) A table that clarifies how all model parameters were estimated needs to be included in the main part of the manuscript.

      Two tables have been added to the manuscript that detail the parameters of the elastic-tendon cat soleus model (in the main body of the text) and the rabbit psoas fibril model (in an appendix). Each table includes:

      • A plain language parameter name

      • The mathematical symbol for the parameter

      • The value and unit of the parameter

      • A coded reference to the data source that indicates both the experimental animal and how the data was used to evaluate the parameter.

      Please see the following text:

      Revision:

      • page 11

      • page 42

      Difference:

      • page 11

      • page 46

      (4) The supplemental information is not properly referenced in the main text. There are a number of smaller issues that also need to be addressed.

      Thank for your attention to detail. The following problems related to Appendix referencing have been fixed:

      • Appendices are now parenthetically referenced at the end of a sentence. However, a few references to figures (that are contained within anAppendix) still appear in the body of the sentence since moving these figure references makes the text difficult to understand.

      • All Appendices are now referenced in the main body of the text.

      (5) Abstract, line 6: While it is commonly assumed that the short range stiffness of muscle is due to cross bridges, Rack & Westbury (1974) noted that it occurs over a distance of 25-35 nm, and that many cross-bridges must be stretched even farther than this distance (their p. 348 middle). It seems unlikely that cross-bridges alone can actually account for the short-range stiffness.

      There are three parts to our response to this comment:

      (a) Rack & Westbury’s definition of short-range-stiffness and unrealistic cross-bridge stretches

      (b) Rack & Westbury’s definition of short-range-stiffness vs. linear-timeinvariant system theory

      (c) Updates to the paper

      a. Rack & Westbury’s definition of short-range-stiffness and unrealistic cross-bridge stretches.

      As you note, on page 348, Rack and Westbury write that ”If the short range stiffness is to be explained in terms of extension of cross-bridges, then many of them must be extended further than the 25-35 nm mentioned above.” Having re-read the paper, its not clear how these three factors are being treated in the 25−35 nm estimate:

      • the elasticity of the tendon and aponeurosis,

      • the elasticity of actin and myosin filaments,

      • and the cycling rate of the cross-bridges.

      Obviously the elasticity of the tendon, aponeurosis, actin, and myosin filaments will reduce the estimated amount of crossbridge strain during Rack and Westbury’s experiments. A potentially larger factor is the cycling rate of each cross-bridge. If each crossbridge cycles faster than 11 Hz (the maximum frequency Rack and Westbury used), then no single crossbridge would stretch by 25-35 nm. So why didn’t Rack and Westbury consider the cycling rate of crossbridges?

      Rack and Westbury’s reasoned that a perfectly elastic work loop would necessarily mean that all crossbridges stayed attached: as soon as a crossbridge cycles it would release its stored elastic energy and the work loop would no longer be elastic. Since Rack and Westbury measured some nearly perfect elastic work loops (the smallest loops in Fig. 2,3, and 4), I guess they assumed crossbridges remained attached during the 25-35 nm crossbridge stretch estimate. However, even Rack and Westbury note that none of the work loops they measured were perfectly elastic and so there is room to entertain the idea that crossbridges are cycling.

      Fortunately, for this discussion, crossbridge cycling rates have been measured.

      In-vitro measurements by Uyeda et al. show that crossbridges are cycling at 30 Hz when moving at 0.5-1.2 length/s. At this rate, there would be enough time for a single crossbridge to cycle nearly 2.72 times for every cycle of the 11 Hz sinusoidal perturbations, reducing its expected strain from 25-35 nm down to 9.2−12.9µm. This effect becomes even more pronounced if crossbridge cycling rate is used to explain the difference in sliding velocity between Uyeda et al.’s in-vitro data (0.5-1.2 length/s) and the maximum contraction velocity of an in-situ cat soleus (4.65 lengths/s, Scott et al.).

      b. Rack & Westbury’s definition of short-range-stiffness vs. linear-time-invariant system theory

      Rack and Westbury defined short-range-stiffness to describe a specific kind of force response of the muscle to cyclical length changes:

      • muscle force is linear with length change,

      • and independent of velocity.

      Rack and Westbury’s definition therefore fails when viscous forces become noticeable, because viscous forces are velocity dependent.

      On line 6 of the abstract the term ‘short-range-stiffness’ is not used because Rack and Westbury’s definition is too narrow for our purposes. Instead we are using the more general approach of approximating muscle as a linear-timeinvariant (LTI) system, where it is assumed that

      • the response of the system is linear

      • and time invariant.

      To unpack that a little, a muscle is considered in the ‘short-range’ in our work if it meets the criteria of a linear time-invariant (LTI) system:

      • the force response of muscle can be accurately described as a linear function of its length and velocity (its state)

      • and its response is not a function of time (which means constant stimulation, and no fatigue).

      In contrast to Rack and Westbury’s definition, the ‘short-range’ in linear systems theory is general enough to accommodate both elastic and viscous forces. In physical terms, small for an LTI approximation of muscle is larger than the short-range defined by Rack and Westbury: an LTI system can include velocity dependence, while short-range-stiffness ends when velocity dependence begins.

      c. Updates to the paper

      To make the differences between Rack and Westbury’s ‘short-range-stiffness’ and LTI system theory clearer: - We have removed all occurrences of ‘short-range’ that were associated with Kirsch et al. and have replaced this phrase with ‘small’.

      • On the first mention of Kirsch’s work we have made the wording more specific

      Revision:

      • page 1, column 1, lines 4,5

      • page 1, column 2, lines 14-21 ”Under constant activation ...”

      Difference: page 1, column 2, line 19-26

      • page 1, column 1, lines 4,5

      • page 1, column 2, lines 20-27 ”Under constant activation ...”

      • A footnote has been added to contrast the definition of ‘small’ in the context of an linear time invariant system to ‘short-range’ in the context of Rack and Westbury’s definition of short-range-stiffness.

      Revision: page 1, column 2, bottom

      Difference: page 1, column 2, bottom

      • In addition, we have added a brief overview of LTI system theory to make the analysis and results more easily understood:

      Revision: Figure 4 paragraph beginning on page 10, column 2, line 15 ”As long as ...”

      Difference: Figure 4 paragraph beginning on page 12, column 1, line 46 ”As long as ...”

      (6) Page 3, lines 6-8: It also seems unlikely that 25% of cross-bridges are attached at one time (Howard, 1997) even for supramaximal isometric stimulation. The number should be less than 20%. What would the ratio of load path stiffness be for low force movements such as changing the direction of a frictionless manipulandum or slow walking? The range of relative stiffnesses is of more interest than the upper limit.

      We have made the following updates to address this comment:

      • A 20% duty cycle now defines the upper bound stiffness of the actinmyosin load path.

      • We have also evaluated the lower bound actin-myosin stiffness when a single crossbridge is attached.

      • The stiffness of titin from Kellermayer et al. has been digitized at a length of 2 µm and 4 µm to more accurately capture the length dependence of titin’s stiffness.

      • We have added a new figure (Figure 14) to make it easier to compare the range of actin-myosin stiffness to titin-actin stiffness.

      • The text in the main body of the paper and the Appendix has been updated.

      • The script ’main ActinMyosinAndTitinStiffness.m’ used to perform the calculations and generate the figure is now a part of the code repository.

      Please see the following text:

      Revision

      • The paragraph beginning at page 2, column 2, line 45 ”The addition of a titin element ...”

      • Appendix A

      • Figure 14 (in Appendix A)

      Difference

      • The paragraph beginning at page 3, column 1, line 6: ”The addition of a titin element ...”

      • Appendix A

      • Figure 14 (in Appendix A)

      (7) Page 5, line 12: A word seems to be missing here, ”...together to further...”.

      Thank you for your attention to detail. The sentence has been corrected.

      Please see the following text:

      • Revision: page 4, column 2, line 40 ”... into a single ...”

      • Difference: page 5, column 1, line 18

      (8) Page 5, line 24-27: These ”theories” are not mutually exclusive, and it is misleading to suggest they are. There is evidence for binding of titin to actin at multiple locations and there is no reason why evidence supporting one binding location must detract from the evidence supporting other binding locations.

      The text has been modified to make it clear to readers that the different titinactin binding locations are not mutually exclusive. Please see the following text:

      • Revision: page 5, column 1, lines 17-19, the sentence beginning ”As previously mentioned, ...”

      • Difference: page 5, column 1, lines 41-44

      (9) Page 5, lines 48-51: Should cite Kellermayer and Granzier (1996) not Kellermayer et al. (1997).

      The reference to ‘Kellermayer et al.’ has been changed to ‘Kellermayer and Granzier’. The comment that the year of the reference should be changed from (1997) to (1996) is confusing: the 1996 paper is being referenced.

      For further details please see:

      • Revision: page 5, column 1, 39-40

      • Difference: page 5, column 2, line 19-22

      (10) Also, Dutta et al. (2018) should be cited as further showing that N2A titin by itself slows actin motility on myosin.

      Thank you for the suggestion. The sentence has been modified to include Dutta et al.:

      For further details please see:

      • Revision: page 5, column 1, 40

      • Difference: page 5, column 2, line 19-22

      (11) Figure 2 legend and elsewhere: it is odd to say that experiments used ”a cat soleus” when more than one cat coleus was used. Change to ”cat coleus”. See also page 15, line 15.

      Thank you for your attention to detail. All occurrences of ‘a cat soleus’ have been changed, with some sentence revision, to ‘cat soleus’.

      (12) Page 6, line 10: It is not clear why an MTU was used to simulate single muscle fiber experiments. What is the justification for choosing this particular model? Also, the choice of model might explain why the version with stiff tendon performs better than the version with an elastic tendon, but this is never mentioned. Why not use a muscle model with no tendon (e.g., Wakeling et al., 2021 J. Biomech.)?

      Please see the response to comment 2.

      (13) Millard et al.’s activation dynamics model also fails to capture the lengthdependence of activation dynamics (Shue and Crago, 1998; Sandercock and Heckman, 1997), which should be noted in the discussion along with other limitations.

      An additional limitations paragraph is in the revised manuscript that addresses this comment specifically. However, we have used Stephenson and Wendt as a reference for the shift in peak isometric force that comes with submaximal activation. In addition, we also reference Chow and Darling for the property that the maximum shortening velocity is reduced with submaximal activations.

      • Revision: page 22, column 1, line 41 ”Finally, the VEXAT model ...”

      • Difference: page 24, column 2, line 12 ”Finally, the VEXAT model ...”

      In addition, please see the response to comment 23.

      (14) Page 6, line 22: ”An underbar...”.

      Thank you for your attention to detail, this correction has been made.

      (14) Page 7, lines 27-32: This and other issues should be described in the Discussion under a heading of model limitations.

      Please see the response to comment 23.

      (15) Page 7, lines 43-44: Numerous papers from the last author’s laboratory contradict the claim that there is no force enhancement on the ascending limb by demonstrating that force enhancement does occur on the ascending limb (see e.g., Leonard & Herzog 2002, Peterson et al., 2004 and several papers from the Rassier laboratory).

      Thank you for your attention to detail. This statement is in error and has been removed. To improve this section of the paper, a paragraph has been added to briefly mention the experimental observations of residual force enhancement before proceeding to explain how this phenomena is represented by the model.

      Please see the following text:

      Revision:

      • the paragraph starting on page 7, column 2, line 43 ”When active muscle is lengthened, ...”

      • and the following paragraph starting on page 8, column 1, line 3 “To develop RFE, ”

      Difference:

      • the paragraph starting on page 8, column 2, line 15

      • and the following paragraph starting on page 9, column 1, line 6

      (17) Figure 3 legend and elsewhere: The authors use Prado et al. (2005) to determine several titin parameters, however the simulations seem to focus on cat soleus, but Prado et al.’s paper is on rabbits. More clarity is needed about which specific results from which species and muscles were used to parameterize the model.

      The new parameter table includes coded entries to indicate the literature source for experimental data, the animal it came from, and how the data was used. For example, the ‘ECM fraction’ has a source of ‘R[57]’ to show that the data came from rabbits from reference 57. For further details, please see the response to comment #3

      Please see the following text:

      • Revision: page 11, column 2, table section H: ‘ECM fraction’.

      • Difference: page 11, column 2, table section H: ‘ECM fraction’.

      To address this comment in a little more detail, we have had to use Prado et al. (2005) to give us estimates for only one parameter: P, the fraction of the passive force-length relation that is due to titin. Prado et al.’s measurements relating to P are unique to our knowledge: these are the only measurements we have to estimate P in any muscle, cat soleus or otherwise. Here we use the average of the values for P across the 5 muscles measured by Prado et al. as a plausible default value for all of our simulations.

      (18) Figure 4 seems unnecessary.

      Figure 4 has been removed.

      (19) Page 10, lines 17-18: provide the abbreviation (VAF) here with the definition (variance accounted for).

      Thank you for your attention to detail. The abbreviation has been added.

      Please see these parts of the manuscripts for details:

      • Revision: page 12, column 2, line 13

      • Difference: page 13, column 2, line 32

      (20) Page 11, lines 2-3: Here and elsewhere, it is clear that some model parameters have been optimized to fit the model. The main paper should include a table that lists all model parameters and how they were chosen or optimized, including but not limited to the information in Table 1 of the supplemental information section.

      See response to comment 3.

      (20) Page 17, lines 45 -49: Again, a substantial number of ad hoc adjustments to the model appear to be required. These should be described in the Discussion under limitations, and accounted for in the parameters table. See also legends to Fig. 12 and 13, page 19, lines 23-26.

      Please see the response to comment #3: a coded entry now appears to indicate the data source, the animal used in the experiment, and the method used to process the data. This includes entries for parameters which were estimated

      ‘E’ so that the model produced acceptable results in the simulations presented. In addition, the new discussion paragraph includes a number of sentences that use the adjustment to the active-titin-damping coefficient as an opening to discuss the limitations of the VEXAT’s titin-actin bond model and the circumstances under which the model’s parameters would need to be adjusted.

      Please see responses to comments 3 and 23 for additional details. In addition, please see the specific discussion text mentioning the change to βoPEVK:

      • Revision: page 22, column 1, line 30 ”In Sec. 3.3 we had ...”

      • Difference: page 24, column 1, line 49

      (22) Page 20, lines 50-11: It should be noted here that Tahir et al.’s (2018) model has both series and parallel elastic elements, provided by superposition of rotation (series) and translation (parallel) of a pulley.

      While it is true that Tahir et al.’s (2018) model has series and parallel elements, as do the other models mentioned, these models do not have the correct structure to yield a gain and phase response that mimics biological muscle. The text that I originally wrote attempted to explain this without going into the details. As you note, this explanation leaves something to be desired. The original text commenting on the models of Forcinito et al, Tahir et al, Haeufle et al., and Gunther et al. has been updated to be more specific.¨ Please see the parts of the following manuscripts for details:

      • Revision: page 22, column 2, line 20, the paragraph beginning ”The models of Forcinito ...”

      • Difference: page 24, column 2, line 44

      (23) Discussion: This section should include a description of model limitations, including the relatively large number of ad hoc modifications and how many parameters must be found by optimization in practice. The authors should discuss what types of data are most compatible for use with the model (ex vivo, in vivo, single fiber, whole muscle, MTU), requirements for applying the model to different types of data, and impediments to using the model on different types of data.

      An additional limitations paragraph has been added to the discussion.

      Please see the following text:

      • Revision: the paragraph beginning on page 22, column 1, line 11 ”Both the viscoelastic ...”

      • Difference: the paragraph beginning on page 24, column 1, line 27.

      Reviewer #2 (Recommendations For The Authors):

      (1) If it is possible to compare the output of this model to other more contemporary models which incorporate titin but are also simple enough to implement in whole-body simulation (such as the winding filament model), this would seem to greatly strengthen the paper.

      That’s an excellent idea, though beyond the scope of this already lengthy paper. Even though the Hill model we evaluated is a bit old it is widely used, and so, many readers will be interested in seeing the benchmark results. As benchmarking work is both difficult to fund and undertake, we do hope that others will evaluate their own models using the code and data we have provided.

      (2) I’m a little unclear on the basis for the transition between short- and midrange length changes, both in reality and in the model. And also about the range of strains that qualify as ”short”. It seems like there is potential for short range stiffness, although I would have thought more in the range of 1-2% strains than >3%, to be due to currently attached crossbridges. There is clear evidence that active titin is responsible for the low stiffness at very large strains that exceed actin-myosin overlap. But I am not clear on how a transitional stiffness on the descending limb of the force-length relationship is implemented in the model, and what aspect of physiology this is replicating. It may be helpful to clarify this further and indicate where in the model this stiffness arises.

      This question has several parts to it which I will paraphrase here:

      A Short-range stiffness acts over smaller strains than 3.8%. How is shortrange defined?

      B Where is the transition made between short-range and mid-range force response, both in reality and in the model. Also how does this change on the descending limb?

      C What components in the model contribute to the stiffness of the CE?

      A. Short-range stiffness acts over smaller strains than 3.8%. How is shortrange defined?

      The response to Reviewer 1’s comment # 5 directly addresses this question.

      B. Where is the transition made between short-range and mid-range forceresponse, both in reality and in the model. Also how does this change on the descending limb? We are going to rephrase the question because of changes in terminology that we have made in response to Reviewer 1’s comment #5.

      (i) What is the basis for the transition between the muscle behaving like an LTI system? Both in reality, and in the model. (ii) What happens outside the LTI range? (iii) Also how does this change on the descending limb?

      We will address this question one part at a time:

      (i) What is the basis for the transition between the muscle behaving like an LTI system? Both in reality, and in the model.

      A system’s response can be approximated as a linear-time-invariant (LTI) system as long as it is time-invariant, and its output can be expressed as a linear function of its input. In the context of Kirsch et al.’s experiment, the ‘system’ is the muscle, the ‘input’ is the time series of length data, and the ‘output’ is the time series of force data. Due to the requirement for timeinvariance, two experimental conditions must be met to approximate muscle as an LTI system:

      • the nominal length of the muscle stays constant over long periods of time,

      • and the nominal activation of the muscle stays constant.

      These conditions were met by default in Kirch et al.’s experiment, and also in our simulations of this experiment. The one remaining condition to assess is whether or not the muscle’s response is linear.

      To evaluate whether the muscle’s force is a linear function of the length change, Kirch et al. evaluated (Cxy)2 the coherence squared between the length and force time-series data. Even though the mathematical underpinnings of (Cxy)2 are complicated, the interpretation of (Cxy)2 is simple: muscle can be accurately approximated as a linear system if (Cxy)2 is close to 1, but the accuracy of this approximation becomes poor as (Cxy)2 approaches 0. Kirsch et al. used (Cxy)2 to identify a bandwidth in which the response of the muscle to the 1−3.8%ℓoM length changes was sufficiently linear for analysis: a lower bound of 4 Hz was identified using (Cxy)2 and the bandwidth of the input signal (15 Hz, 35 Hz, or 90 Hz) set the upper bound. In Fig. 3 of Kirsch et al. the (Cxy)2 at 4 Hz has a value of at least 0.67 for the 15 Hz and 90 Hz signals. To minimize error in our analysis and yet be consistent with Kirsch et al., we analyze the bandwidth common to both (Cxy)2 ≥ 0.67 and Kirsch et al.’s defined range. Though the bandwidth defined by the criteria (Cxy)2 ≥ 0.67 is usually larger than the one defined by Kirsch et al., there are some exceptions where the lower frequency bound of the models is higher than 4 Hz (now reported in Tables 4D and 5D).

      (ii) What happens outside the LTI range?

      When a muscle’s output cannot be considered a LTI it means that either that its length or activation is time-varying, or the relationship between length and force is no longer linear. In short, that the muscle is behaving as one would normally expect: time-varying and non-linearly. The wonderful part of Kirsch et al.’s work is that they found a surprisingly large region in the frequency domain where muscle behaves linearly and can be analyzed using the powerful tools of linear systems and signals.

      (iii) Also how does this change on the descending limb?

      Since nominal length of Kirsch et al.’s experiments is ℓoM it is not clear how the results of the perturbation experiments will change if the nominal length is moved firmly to the descending limb. However, we can see how the stiffness and damping values will change by examining Figure 9C and 9D which shows the calculated stiffness and damping of the VEXAT and Hill models as ℓM is lengthened from ℓoM down the descending limb: the stiffness and damping of the VEXAT model does not change much, while the Hill model’s stiffness changes sign and the damping coefficient changes a lot. What cannot be seen from Figure 9C and 9D is how the bandwidth over which the models are considered linear changes.

      We have made a number of updates to the text to more clearly communicate these details of our response to part (i):

      • Text has been edited so that it is clear that the terms ’short-range stiffness’ and ’small’ from Rack and Westbury’s work is not confused with ’stiffness’ and ’small’ from the LTI system’s analysis. Please see our response to comment # 5 for details.

      • We have added text to the main body of the paper to explain how the coherence squared metric was used to select a bandwidth in which the response of the system is approximately linear:

      • Revision: the paragraph that starts on page 11, column 1, line 3 ”Kirsch et al. used system identification ...”

      – Difference: page 13, column 2, line 1

      – Coherence is defined in Appendix D

      – Coherence is now also included in the example script ‘main SystemIdentificationExample.m’

      • The bandwidth over which model output can be considered linear (coherence squared > 0.67) has been added to Tables 4 and 5

      – Revision: see Table 4D, and Table 5D in Appendix E

      – Difference: see Table 4D, and Table 5D in Appendix E

      • Figures 6 and Figures 16 are annotated now if the plotted signal does not meet the linearity requirement of Cxy > 0.67.

      C. What components in the model contribute to the stiffness of the CE?

      There are three components that contribute to the stiffness of the CE which are pictured in Figure 1, appear in Eqn. 15, and are listed explicitly in Eqn. 76:

      (a) The XE, as represented by the afL(ℓ˜S+L˜M)k˜oX term in Eqn. 15.

      (b) The elasticity of the distal segment of titin, f2(ℓ˜2). Only f2(ℓ˜2) appears in Eqn. 15 because ℓ˜1 is a model state.

      (c) The extracellular matrix, as represented by the fECM(ℓ˜ECM)

      There is also a compressive element fKE, but it plays no role in the simulations presented in this work because it only begins to produce force at extremely short CE lengths (ℓ˜M < 0.1ℓoM).

      We have made the following changes to make these components clearer

      Figure 1A has been updated:

      – The symbols for a spring and a damper are now defined in Figure 1A

      – The ECM now has a spring symbol. Now all springs and dampers have the correct symbol in Figure 1A.

      – The caption now explicitly lists the rigid, viscoelastic, and elastic elements in the model

      The equations for the VEXAT’s CE stiffness and damping are now compared and contrasted to the the Hill model’s stiffness and damping in Sec. 3.1.

      – Revision: starting at page 14, column 2, line 1: Eqn. 28 and Eqn. 29 and surrounding text

      – Difference: page 17, column 1, line 22

      (3) This model appears to be an amalgamation of a phenomenological (forcelength and force-velocity relationships) and a mechanistic (crossbridge and titin stiffness and damping) model. While this may improve predictions, and so potentially be useful, it also seems like it limits the interpretation of physiological underpinnings of any findings. It may be helpful to explore in greater detail the implications of this approach.

      We have added a limitations paragraph to the discussion which addresses this comment and can be found in:

      • Revision: the paragraph beginning on page 22, column 1, line 11 ”Both the viscoelastic ...”

      • Difference: the paragraph beginning on page 24, column 1, line 27

      (4)As a biologist, I found the interpretation of phase and gain a little difficult and it may help the reader to show in greater detail the time series data and model predictions to highlight conditions under which the models do not accurately capture the magnitude and timing of force production.

      It is important that the ideas of phase and gain are understood, especially because little information can be gleaned from the time series data directly. There is some time series data in the paper already that compares each model’s response to its spring-damper of best fit: plots of the force response of each model and its spring damper of best fit can be found in Figures 6A, 6D, 6G, 6J, 16A, 16D, 16G, and 16J in the revised manuscript. While it is clear that models with a higher VAF more closely match the spring-damper of best fit, there is not much more that can be taken from time series data: the systematic differences, particularly in phase, are just not visually apparent in the time-domain but are clear in gain and phase plots in the frequency-domain.

      To make the meaning of phase and gain plots clearer, Figure 4 (Figure 5 in the first submission) has been completely re-made and includes plots that illustrate the entire process of going from two length and force timedomain signals to gain and phase plots in the frequency-domain. Included in this figure is a visual representation of transforming a signal from the time to the frequency domain (Fig. 4B and 4C), and also an illustration of the terms gain and phase (Fig. 4D). In addition, a small example file ’main SystemIdentificationExample.m’ has been added to the matlab code repository in the elife2023 branch to accompany Appendix D, which goes through the mathematics used to transform input and output time domain signals into gain and phase plots of the input-output relation. Small updates have been made to Figure 6 and 16 in the revised paper (Figures 7 and 18 in the first submission) to make the time domain signals from the spring-damper of best fit and the model output clearer. Finally, I have re-calculated the gain and phase profiles using a more advanced numerical method that trades off some resolution in frequency for more accuracy in the magnitude. This has allowed me to make Figures 6 and 16 easier to follow because the gain and phase responses are now lines rather than a scattering of points. We hope that these additions make the interpretation of gain and phase clearer.

      Please see

      Revision:

      – Figure 4 and caption on page 12

      – The opening 2 paragraphs of Sec 3.1 starting on page 10, column 2, line 4 ”In Kirsch et al.’s ...”

      – Figure 6 & 16: spring damper and model annotation added, plotted the gain and phase as lines

      – Appendix D: Updated to include coherence and the more advanced method used to evaluate the system transfer function, gain, and phase.

      Difference:

      – Figure 4 and caption on page 12

      – The opening 2 paragraphs of Sec 3.1 starting on page 12, column 1, line 34 and ending on page 13, column 2, line 29

      – Figure 6 & 16: spring damper and model annotation added

      – Appendix D

      (5) The actin-myosin and actin-titin load pathways are depicted as distinct in the model. However, given titin’s position in the center of myosin and the crossbridge connections between actin and myosin, this would seem to be an oversimplification. It seems worth considering whether the separation of these pathways is justified if it has any effect on the conclusions or interpretation.

      We have reworked one of the discussion paragraphs to focus on how our simulations would be affected by two mechanisms (Nishikawa et al.’s winding filament theory and DuVall et al.’s titin entanglement hypothesis) that make it possible for crossbridges to do mechanical work on titin.

      • Revision: the paragraph beginning on page 21, column 2, line 42 “The active titin model ...”

      • Difference: the paragraph beginning on page 23, column 2, line 48

      References

      Nishikawa KC, Monroy JA, Uyeno TE, Yeo SH, Pai DK, Lindstedt SL. Is titin a ‘winding filament’? A new twist on muscle contraction. Proceedings of the royal society B: Biological sciences. 2012 Mar 7;279(1730):981-90.

      DuVall M, Jinha A, Schappacher-Tilp G, Leonard T, Herzog W. I-Band Titin Interaction with Myosin in the Muscle Sarcomere during Eccentric Contraction: The Titin Entanglement Hypothesis. Biophysical Journal. 2016 Feb 16;110(3):302a.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Naseri et al. present a new strategy for identifying human genetic variants with recessive effects on disease risk by the genome-wide association of phenotype with long runs-of-homozygosity (ROH). The key step of this approach is the identification of long ROH segments shared by many individuals (termed "shared ROH diplotype clusters" by the authors), which is computationally intensive for large-scale genomic data. The authors circumvented this challenge by converting the original diploid genotype data to (pseudo-)haplotype data and modifying the existing positional Burrow-Wheeler transformation (PBWT) algorithms to enable an efficient search for haplotype blocks shared by many individuals. With this method, the authors identified over 1.8 million ROH diplotype clusters (each shared by at least 100 individuals) and 61 significant associations with various non-cancer diseases in the UK Biobank dataset.

      Overall, the study is well-motivated, highly innovative, and potentially impactful. Previous biobank-based studies of recessive genetic effects primarily focused on genome-wide aggregated

      ROH content, but this metric is a poor proxy for homozygosity of the recessive alleles at causal loci. Therefore, searching for the association between phenotype and specific variants in the homozygous state is a key next step towards discovering and understanding disease genes/alleles with recessive effects. That said, I have some concerns regarding the power and error rate of the methods, for both identification of ROH diplotype clusters and subsequent association mapping. In addition, some of the newly identified associations need further validation and careful consideration of potential artifacts (such as cryptic relatedness and environment sharing).

      1) Identification of ROH diplotype clusters.

      The practice of randomly assigning heterozygous sites to a homozygous state is expected to introduce errors, leading to both false positives and false negatives. An advantage that the authors claim for this practice is to reduce false negatives due to occasional mismatch (possibly due to genotyping error, or mutation), but it's unclear how much the false positive rate is reduced compared to traditional ROH detection algorithm. The authors also justified the "random allele drawing" practice by arguing that "the rate of false positives should be low" for long ROH segments, which is likely true but is not backed up with quantitative analysis. As a result, it is unclear whether the trade-off between reducing FNs and introducing FPs makes the practice worthwhile (compared to calling ROHs in each individual with a standard approach first followed by scanning for shared diplotypes across individuals using BWT). I would like to see a combination of back-of-envelope calculation, simulation (with genotyping errors), and analysis of empirical data that characterize the performance of the proposed method.

      In particular, I find the high number of ROH clusters in MHC alarming, and I am not convinced that this can be fully explained by a high density of SNPs and low recombination rate in this region. The authors may provide further support for their hypothesis by examining the genome-wide relationship between ROH cluster abundance and local recombination rate (or mutation rate).

      Thanks for this insightful comment. Through additional experiments, we confirmed that the excessive number of ROH clusters in the MHC region is due to the higher density of markers per centimorgan. As discussed above at Essential Revision 2, we took this opportunity to modify our code to search for clusters with the minimum length in terms of cM instead of sites. We have also provided the genetic distance for reported clusters in the MHC region with significant association (genetic length (cM) column in Tables 1 and 2). We include the following in the main text:

      “We searched for ROH clusters using a minimum target length of 0.1 cM (Figure 3–figure supplement 1). As shown in the figure, there is no excessive number of ROH clusters in chromosome 6 as was spotted using a minimum number of variant sites.”

      Methods section, ROH algorithm subsection:

      “We implemented ROH-DICE to allow direct use of genetic distances in addition to variant sites for L. The program can take minimum target length L directly in cM and detect all ROH clusters greater than or equal to the target length in cM. The program holds a genetic mapping table for all the available sites, and cPBWT was modified to work directly with the genetic length instead of the number of sites.”

      2) Power of ROH association. Given that the authors focused on long segments only (which is a limitation of the current method), I am concerned about the power of the association mapping strategy, because only a small fraction of causal alleles are expected to be present in long, homozygous haplotypes shared by many individuals. It would be useful to perform a power analysis to estimate what fraction of true causal variants with a given effect size can be detected with the current method. To demonstrate the general utility of this method, the authors also need to characterize the condition(s) under which this method could pick up association signals missed by standard GWAS with recessive effects considered. I suspect some variants with truly additive effects can also be picked up by the ROH association, which should be discussed in the manuscript to guide the interpretation of results.

      We added a new experiment in the Results section “Evaluation of ROH clusters in simulated data” under Power of ROH-DICE in association studies. We compared the power of the ROH cluster with additive, recessive, and dominant models. Our simulation shows that using ROH clusters outperforms standard GWAS when a phenotype is associated with a set of consecutive homozygous sites. We added the following text:

      “...We calculated the p-values for both ROH clusters and all variant sites. We used a p-value cut-off of 0.05 divided by the number of tests for each phenotype to determine whether the calculated p-value was smaller than the threshold, indicating an association. For GWAS, only one variant site within the ROH cluster, contributing to the phenotype, was required. We tested for all additive, dominant, and recessive effects (Figure 1–figure supplement 3). The figure demonstrates that ROH-DICE outperforms GWAS when a phenotype is associated with a set of consecutive homozygous sites. The maximum effect size of 0.3 resulted in ROH clusters achieving a power of 100%, whereas the additive model only achieved 11%, and the dominant and recessive models achieved 52% and 70%, respectively. The GWAS with recessive effect yields the best results among other GWAS tests, however, its power is still lower than using ROH clusters.”

      3) False positives of ROH association. GWAS is notoriously prone to confounding by population and environmental stratification. Including leading principal components in association testing alleviates this issue but is not sufficient to remove the effects of recent demographic structure and local environment (Zaidi and Mathieson 2020 eLife). Similar confounding likely applies to homozygosity mapping and should be carefully considered. For example, it is possible that individuals who share a lot of ROH diplotypes tend to be remotely related and live near each other, thus sharing similar environments. Such scenarios need to be excluded to further support the association signals.

      We acknowledge that there could be confounding factors that may affect the association's results. To address this, we utilized principal component (PC) values and additional covariates while using PHESANT after our initial Chi-square tests. We also included your comments in our Discussion section:

      "We used age, gender, and genetic principal components as confounding variables in the association analysis. Genetic principal components can reduce the confounding effect brought on by population structure but it may be insufficient to completely eliminate the effects of recent demographic structure and the local environment45. For example, individuals sharing excessive ROH diplotypes may share similar environments since they are closely related and reside close to one another. Since we did not rule out related individuals, some of the reported GWAS signals may not be attributable to ROH.”

      4) Validation of significant associations. It is reassuring that some of the top associations are indirectly corroborated by significant GWAS associations between the same disease and individual SNPs present in the ROH region (Tables 1 and 2). However, more sanity checks should be done to confirm consistency in direction of effect size (e.g., risk alleles at individual SNPs should be commonly present in risk-increasing ROH segment, and vice versa) and the presence of dominance effect.

      The beta values for effect size are now included in all reported tables. All beta values for ROH-DICE are positive indicating carriers of these ROH diplotypes may increase the risk of certain non-cancerous diseases. Moreover, we conducted the suggested sanity check to confirm the consistency of the direction of risk-inducing ROH diplotypes and risk alleles.

      We also computed D’ as a measure of linkage between the reported GWAS results and ROH clusters. We found that most of the GWAS results and ROH clusters are strongly correlated. However, in a few cases, D' is small or close to zero. In such cases, the reported p-value from GWAS was also insignificant, while the ROH cluster indicated a significant association. We included these points in the Results section.

      Reviewer #3 (Public Review):

      A classic method to detect recessive disease variants is homozygosity mapping, where affected individuals in a pedigree are scanned for the presence of runs of homozygosity (ROH) intersecting in a given region. The method could in theory be extended to biobanks with large samples of unrelated individuals; however, no efficient method was available (to the best of my knowledge) for detecting overlapping clusters of ROH in such large samples. In this paper, the authors developed such a method based on the PBWT data structure. They applied the method to the UK biobank, finding a number of associations, some of them not discovered in single SNP associations.

      Major strengths:

      •           The method is innovative and algorithmically elegant and interesting. It achieves its purpose of efficiently and accurately detecting ROH clusters overlapping in a given region. It is therefore a major methodological advance.

      •           The method could be very useful for many other researchers interested in detecting recessive variants associated with any phenotype.

      •           The statistical analysis of the UK biobank data is solid and the results that were highlighted are interesting and supported by the data.

      Major weaknesses:

      •           The positions and IDs of the ROH clusters in the UK biobank are not available for other researchers. This means that other researchers will not be able to follow up on the results of the present paper.

      We included the SNP IDs, positions, and consensus alleles for all reported loci in the main tables. Moreover, additional information including beta and D’ values were added. The current information should allow researchers to follow up on the results. Supplementary File 2 contains beta, D’ values for all reported clusters.

      Supplementary File 3 contains the SNP IDs and consensus alleles for all reported clusters in Tables 1 and 2. The consensus allele denotes the allele with the highest occurrence in the reported clusters.

      •           The vast majority of the discoveries were in regions already known to be associated with their respective phenotypes based on standard GWAS.

      We agree that a majority of the ROH regions are indeed consistent with GWAS. However, some regions were missed by standard GWAS (e.g. chr6:25969631-26108168, hemochromatosis). Our message is that our method is a complementary approach to standard GWAS and will not replace standard GWAS analysis. See our response to Reviewer #2 Point Six.

      •           The running time seems rather long (at least for the UK biobank), and therefore it will be difficult for other researchers to extensively experiment with the method in very large datasets. That being said, the method has a linear running time, so it is already faster than a naïve algorithm.

      Thank you for your input. The algorithm used to locate matching blocks is efficient and the total CPU hours it consumed was the reported run time. Since it consumes very little memory and resources, it can be executed simultaneously for all chromosomes. We also noticed that a significant time was being spent parsing the input file and slightly modified our script to improve the parsing. We also re-ran it for all chromosomes in parallel and reported the elapsed time which was only 18 hours and 54 minutes.

      “This was achieved by running the ROH-DICE program, with a wall clock time of 18 hours and 54 minutes where the program was executed for all chromosomes in parallel (total CPU hours of ~ 242.5 hours). The maximum residence size for each chromosome was approximately 180 MB.”

    1. Author response;

      Reviewer #1 (Public Review):

      Authors investigated the role of OBOX4 in the zygotic genome activation (ZGA) in mice. Obox4 genes form an array of duplicated genes they were identified as a candidate ZGA factor based on expression patterns during early development. The role of OBOX4 was subsequently studied in embryonic stem cells and early embryos. It was found that transcriptional activation mediated by OBOX4 has similar features as that of DUX, which was previously identified as a zygotic transcription factor involved in ZGA and a major activator of the zygotic expression program. It was, however, unexpected that Dux knock-out did not impair embryonic development. The work by Guo et al. provides several lines of evidence that OBOX4-mediated activation of gene expression considerably overlaps with that of DUX and this redundancy might explain the loss of early developmental phenotype in Dux mutants. Consistent with this model, double mutants of Obox4 and Dux show impaired development. Given the difficulties with investigating details of the genetic model in double mutants at the preimplantation embryo stage, authors not only crossed genetic mutants, but also used (1) nuclear transfer of mutated nuclei of ESCs, which could be characterized on their own in separate experiments, and (2) antisense oligonucleotides (ASO) microinjection, which included a rescue control demonstrating that reintroducing OBOX4 is sufficient to rescue the phenotype caused by blocking both, Dux and Obox4.

      This work is important for the field because it reveals functional redundancy and plasticity of the zygotic genome activation in mammals, where the mouse model stands as a remarkable example of genome activation, which massively integrated long terminal repeat (LTR)-derived enhancers from retrotransposons and now two of the key activating zygotic factors appear to be encoded by tandemly duplicated clusters of different phylogenetic age. Identification of OBOX4 as a second factor partially redundant with DUX now allows us to decipher what constitutes the essential part of the ZGA program.

      We are grateful for the reviewer’s appreciation of our work, particularly the technical difficulty of knocking out two multicopy genes and the value of the rescue experiment.

      Reviewer #2 (Public Review):

      In this study, Guo et al., screened a few homeobox transcription factors and identified that Obox4 can induce the 2-cell like state in mouse embryonic stem cells (mESCs) (Fig. 1 and 2). The authors also compared in detail how Obox4 vs. Dux in activating 2C repeats and genes in mESCs (Fig. 3). Compared to Dux, Obox4 activates fewer 2C genes (Fig. 2). In addition, although both Obox4 and Dux bind to MERVL elements, Obox4 additionally binds to ERVK (Fig. 3). The authors then used three different approaches (i.e., SCNT-mediated KO, ASO-mediated KD, and genetic KO) to study how Obox4 and Dux regulates zygotic genome activation in embryos. Although there are some inconsistencies among different approaches, the authors were able to show that loss of both Obox4 and Dux causes more severe consequences than loss of single protein in embryonic development and zygotic genome activation (Fig. 4 and 5).

      Overall, this is a comprehensive study that addresses an important question that puzzles the community. However, some comparisons to the recent work by Ji et al (PMID: 37459895) are highly recommended. Ji et al knocked out the entire Obox cluster (including Obox4) in mice and found that Obox cluster KO causes 2-4 cell arrest without affecting Dux. That said, Obox proteins seem more critical than Dux in regulating ZGA, and Obox cluster KO cannot be compensated by Dux. Ji et al., also reported that maternal (Obox1, 2, 5, 7) and zygotic (Obox3, 4) Obox proteins redundantly regulate embryogenesis because loss of either is compatible to development. Consistent with Ji's work, Obox4 KO embryos generated in this study can develop to adulthood and are fertile. Since these two studies are highly relevant, some comparisons of Obox4 KO and Obox4/Dux DKO with the previous Obox cluster KO will greatly benefit the community.

      We thank the reviewer for appreciating the value of our study. We are aware of the work done to high standard by Ji et al. and have included a comparison between our data and the data by Ji et al. in the revised manuscript. Despite repeated attempts, various crossing strategies failed to produce Obox4KO/DuxKO mating pairs that could be used to produce large number of Obox4KO/DuxKO embryos required for in-depth transcriptome analysis. Based on the quality of the RNA-seq, we decided to perform comparative analysis using our ASO KD data and showed that Obox4 has distinct regulatory targets from those of other Obox family members, which is consistent with the phylogenetic distance within the family.

    1. Author response:

      A general comment was that this study left several key questions unanswered, in particular the causal mechanism for the reported ribosomal distributions. We have been interested in the evolution of asymmetric bacterial growth and aging for many years. However, a motivational difference is that we are more interested in the evolutionary process, and evolution by natural selection works on the phenotype. Thus, we wanted to start with the phenotype closest to fitness, appropriately defined for the conditions, work downwards. We examined first the asymmetry of elongation rates in single cells, then gene products, and now ribosomes. As we have pointed out, our demonstration of ribosomal asymmetry shows that the phenomenon was not peculiar and unique to the gene products we examined. Rather, the asymmetry is acting higher up in the metabolic network and likely affecting all genes. We find such conceptual guidance to be important. In the ideal world, of course we would have liked to have worked out the causal mechanisms in one swoop. In a less than ideal situation, it is a subjective decision as where to stop. We believe that the publication of this manuscript is more than appropriate at this juncture. We work at the interface of evolutionary theory and microbiology. Our results could appeal to both fields. If we attract new researchers, progress could be accelerated. Could the delay caused by publishing only completed stories slow the rate of discovery? These questions are likely as old as science (e.g., https://telliamedrevisited.wordpress.com/2021/01/28/how-not-to-write-a-response-to-reviewers/).

      We present below our response to specific comments by reviewers. We have not added a new discussion of papers suggested by Reviewer #1 because we feel that the speculations would have been too unfocused. We were already criticized for speculation in the Discussion about a link between aggregate size and ribosomal density.

      Respond to Major comments by Reviewer #1.

      (a) Fig. 1 only shows 2 divisions (rather than 3 as per Rev1) to avoid an overly elaborate figure. We have added text to the figure legend that the old and new poles and daughters in the subsequent 3, 4, 5, 6, and 7 generations can be determined by following the same notations and tracking we presented for generations 1 and 2 in Fig. 1. For example, if we know the old and new poles of any of the four daughters after 2 divisions (as in Fig. 1), and allow that daughter to elongate, become a mother, and divide to produce 2 “grand-daughters”, the polarity of the grand-daughters can also be determined.

      (b) Because division times were normalized and analyzed as quartiles, the raw values were never used. Rather than annotating unused values, we have provided the mean division times in the Material and Methods section on normalization to provide representative values.

      (c) We did not quantify in our study the changes over generations for three reasons. First, the sample sizes for the first generations (cohorts of 1, 2, 4, and 8 cells) are statistically small. Second, and most importantly, cells on an agar pad in a microscope slide, despite being inoculated as fresh exponentially growing cells, experience a growth lag, as all cells transferred to a new physiological condition. Thus, to be safe, we do not collect data from cohorts 1, 2, 4, and 8 to ensure that our cells are as much as possible physiologically uniform. Lastly, as we noted in the Material and Methods they also slow down after 7 generations (128 cells). Thus, we have collected ribosome and length measurements primarily from cohorts 16, 32, 64, and 128. Measurable cells from the 128 cohort are actually rare because a colony with that many cells often starts to form double layers, which are not measurable. Most of our measurements came from the 16, 32, and 64 cohorts, in which case a time series would not be meaningful. Some of these details were not included in our manuscript but have been added to the Material and Methods (Microscopy and time-lapse movies). For these reasons we have not added a time series as requested by the reviewer.

      (d) We have added the additional figure as requested, but as a supplement rather than in the main article (Supplemental Materials Fig. S1). This figure showed the normalized density of ribosomes along the normalized length of old and new daughters. The density was continuous rather than quartiles. This figure was included in the original manuscript, but readers recommended that it be removed because the all the analyzed data had been done with quartiles. Readers felt mislead and confused.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We greatly appreciate the comments from the editor and the reviewers, based on which we have made the revisions. We have responded to all the questions and summarized the revisions below. The changes are also highlighted in the manuscript.

      Additionally, we’ve noticed a few typos in the manuscript presented on the eLife website, which were not there in our originally submitted file.

      (1) In both the “Full text” presented on the eLife website and the pdf file generated after clicking “Download”: the last FC1000 in the second paragraph of the “Extensive induction curves fitting of TetR mutants” section should be FC1000WT .

      (2) In the pdf file generated after clicking “Download”: the brackets are all incorrectly formatted in the captions of Figure 4 and Figure 3—figure supplement 6.

      eLife assessment

      The fundamental study presents a two-domain thermodynamic model for TetR which accurately predicts in vivo phenotype changes brought about as a result of various mutations. The evidence provided is solid and features the first innovative observations with a computational model that captures the structural behavior, much more than the current single-domain models.

      We appreciate the supportive comments by the editor and reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors’ earlier deep mutational scanning work observed that allosteric mutations in TetR (the tetracycline repressor) and its homologous transcriptional factors are distributed across the structure instead of along the presumed allosteric pathways as commonly expected. Especially, in addition, the loss of the allosteric communications promoted by those mutations, was rescued by additional distributed mutations. Now the authors develop a two-domain thermodynamic model for TetR that explains these compelling data. The model is consistent with the in vivo phenotypes of the mutants with changes in parameters, which permits quantification. Taken together their work connects intra- and inter-domain allosteric regulation that correlate with structural features. This leads the authors to suggest broader applicability to other multidomain allosteric proteins. Here the authors follow their first innovative observations with a computational model that captures the structural behavior, aiming to make it broadly applicable to multidomain proteins. Altogether, an innovative and potentially useful contribution.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      None that I see, except that I hope that in the future, if possible, the authors would follow with additional proteins to further substantiate the model and show its broad applicability. I realize however the extensive work that this would entail.

      We thank the reviewer for the supportive comments and the suggestion to extend the model to other proteins, which we indeed plan to pursue in future studies.

      Reviewer #2 (Public Review):

      Summary:

      This combined experimental-theoretical paper introduces a novel two-domain statistical thermodynamic model (primarily Equation 1) to study allostery in generic systems but focusing here on the tetracycline repressor (TetR) family of transcription factors. This model, building on a function-centric approach, accurately captures induction data, maps mutants with precision, and reveals insights into epistasis between mutations.

      Strengths:

      The study contributes innovative modeling, successful data fitting, and valuable insights into the interconnectivity of allosteric networks, establishing a flexible and detailed framework for investigating TetR allostery. The manuscript is generally well-structured and communicates key findings effectively.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The only minor weakness I found was that I still don’t have a better sense into (a) intuition and (b) mathematical derivation of Equation 1, which is so central to the work. I would recommend that the authors provide this early on in the main text.

      We thank the reviewer for the suggestion. The full mathematical derivation of Equation 1 is given in the first section of the supplementary file. Given the length of the derivation, we think it’s better to keep it in the supplementary file rather than the main text. In the main text, the first subsection (overview of the two-domain thermodynamic model of allostery) of the Results section and the paragraph right before Equation 1 are meant for providing intuitive understandings of the two-domain model and the derivation of Equation 1, respectively.

      We would also like to point the reviewer to Figure 2-figure supplement 2 and Equations (12) to (18) in the supplementary file for an alternative derivation. They show that the equilibria among all molecular species containing the operator are dictated by the binding free energies, the ligand concentration, and the allosteric parameters. The probability of an unbound operator (proportional to the probability that the promoter is bound by a RNA polymerase, or the gene expression level) can thus be calculated using Equation (12), which then leads to main text Equation 1 following the derivation given there.

      Additionally, we’ve added a paragraph to the main text (line 248-260) to aid an intuitive understanding of Equation 1.

      “The distinctive roles of the three biophysical parameter on the induction curve as stipulated in Equation 1 could be understood in an intuitive manner as well. First, the value of εD controls the intrinsic strength of binding of TetR to the operator, or the intrinsic difficulty for ligand to induce their separation. Therefore, it controls how tightly the downstream gene is regulated by TetR without ligands (reflected in leakiness) and affects the performance limit of ligands (reflected in saturation). Second, the value of εL controls how favorable ligand binding is in free energy. When εL increases, the binding of ligand at low concentrations become unfavorable, where the ligands cannot effectively bind to TetR to induce its separation from the operator. Therefore, the fold-change as a function of ligand concentration only starts to noticeably increase at higher ligand concentrations, resulting in larger EC50. Third, as discussed above, γ controls the level of anti-cooperativity between the ligand and operator binding of TetR, which is the basis of its allosteric regulation. In other words, γ controls how strongly ligand binding is incompatible with operator binding for TetR, hence it controls the performance limit of ligand (reflected in saturation).”

      We hope that the reviewer will find this explanation helpful.

      Reviewer #3 (Public Review):

      Summary:

      Allosteric regulations are complicated in multi-domain proteins and many large-scale mutational data cannot be explained by current theoretical models, especially for those that are neither in the functional/allosteric sites nor on the allosteric pathways. This work provides a statistical thermodynamic model for a two-domain protein, in which one domain contains an effector binding site and the other domain contains a functional site. The authors build the model to explain the mutational experimental data of TetR, a transcriptional repress protein that contains a ligand and a DNA-binding domain. They incorporate three basic parameters, the energy change of the ligand and DNA binding domains before and after binding, and the coupling between the two domains to explain the free energy landscape of TetR’s conformational and binding states. They go further to quantitatively explain the in vivo expression level of the TetR-regulated gene by fitting into the induction curves of TetR mutants. The effects of most of the mutants studied could be well explained by the model. This approach can be extended to understand the allosteric regulation of other two-domain proteins, especially to explain the effects of widespread mutants not on the allosteric pathways. Strengths: The effects of mutations that are neither in the functional or allosteric sites nor in the allosteric pathways are difficult to explain and quantify. This work develops a statistical thermodynamic model to explain these complicated effects. For simple two-domain proteins, the model is quite clean and theoretically solid. For the real TetR protein that forms a dimeric structure containing two chains with each of them composed of two domains, the model can explain many of the experimental observations. The model separates intra and inter-domain influences that provide a novel angle to analyse allosteric effects in multi-domain proteins.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      As mentioned above, the TetR protein is not a simple two-main protein, but forms a dimeric structure in which the DNA binding domain in each chain forms contacts with the ligand-binding domain in the other chain. In addition, the two ligand-binding domains have strong interactions. Without considering these interactions, especially those mutants that are on these interfaces, the model may be oversimplified for TetR.

      We thank the reviewer for this valid concern and acknowledge that TetR is a homodimer. However, we’ve deliberately chosen to simplify this complexity in our model for the following reasons.

      (1) In this work, we aim to build a minimalist model for two-domain allostery withonly the most essential parameters for capturing experimental data. The simplicity of the model helps promote its mechanistic clarity and potential transferability to other allosteric systems.

      (2) Fewer parameters are needed in a simpler model. Our two-domain modelcurrently uses only three biophysical parameters, which are all demonstrated to have distinct influences on the induction curve (see the main text section “System-level ramifications of the two-domain model”). This enables the inference of parameters with high precision for the mutants, and the quantification of the most essential mechanistic effects of their mutations, provided that the model is shown to accurately recapitulate the comprehensive dataset. Thus, we found it was unnecessary to add another parameter for explicitly describing inter-chain coupling, which would likely incur uncertainty in the inference of parameters due to the redundancy of their effects on induction data, and prevent the model from making faithful predictions.

      (3) From a more biological point of view, TetR is an obligate dimer, meaning thatthe two chains must synchronize for function, supporting the two-domain simplification of TetR for binding concerns.

      Additionally, as shown in the subsection “Inclusion of single-ligand-bound state of repressor” of section 1 of the supplementary file, incorporating the dimeric nature of TetR in our model by allowing partial ligand binding does not change the functional form of main text equation 1 in any practical sense. Therefore, considering all the factors stated above, we think that increasing the complexity of the two-domain model will only be necessary if additional data emerge to suggest the limitation of our model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an excellent work. I have only one suggestion for the authors. Interestingly, the authors also note that the epistatic interactions that they obtain are consistent with the structural features of the protein, which is not surprising. Within this framework, have the authors considered rescue mutations? Please see for example PMID: 18195360 and PMID: 15683227. If I understand right, this might further extend the applicability of their model. If so, the authors may want to add a comment to that effect.

      We thank the reviewer for the supportive comments and for pointing us to the useful references. We have added some comments to the main text regarding this point in line 332-336: “The diverse mechanistic origins of the rescuing mutations revealed here provide a rational basis for the broad distributions of such mutations. Integrating such thermodynamic analysis with structural and dynamic assessment of allosteric proteins for efficient and quantitative rescuing mutation design could present an interesting avenue for future research, particularly in the context of biomedical applications (PMID: 18195360, PMID: 15683227).”

      Reviewer #3 (Recommendations For The Authors):

      The authors should try to build a more realistic dimeric model for TetR to see if it could better explain experimental data. If it were too complicated for a revision, more discussions on the weakness of the current model should be given.

      We thank the reviewer for this valid concern and for the suggestion. The reasons for refraining from increasing the complexity of the model are fully discussed in our response to the reviewer’s public review given above. Primarily, we think that the value of a simple physical model is two-fold (e.g., the paradigm Ising model in statistical physics and the classic MWC model), first, its mechanistic clarity and potential transferability makes it a useful conceptual framework for understanding complex systems and establishing universal rules by comparing seemingly unrelated phenomena; second, it provides useful insights and design principles of specific systems if it can quantitatively capture the corresponding experimental data. Thus, given the current experimental data set, we believe it is justified to keep the two-domain model in its current form, while additional experimental data could necessitate a more complex model for TetR allostery in the future. Relevant discussions are added to the main text (line 443-446) and section 8 of the supplementary file.

      “It’s noted that the homodimeric nature of TetR is ignored in the current two-domain model to minimize the number of parameters, and additional experimental data could necessitate a more complex model for TetR allostery in the future (see supplementary file section 8 for more discussions).”

      Minor issues:

      (1) There is an error in Figure 3A, the 13th and 14th subgraphs are the same and should be corrected.

      We thank the reviewer for capturing this error, which has been corrected in the revised manuscript.

      (2) The criteria for the selection of mutants for analysis should be clearly given. Apart from deleting mutants that are in direct contact with the ligand of DNA, how many mutants are left, and how far are they are from the two sites? In line 257, what are the criteria for selecting these 15 mutants? Similarly, in line 332, what are the criteria for selecting these 8 mutants?

      We thank the reviewer for this comment. The data selection criteria are now added in section 7 of the supplementary file. The distances to the DNA operator and ligand of the 21 residues under mutational study are now added in Table 1 (Figure 3-figure supplement 9). The added materials are referenced in the main text where relevant.

      “7. Mutation selection for two-domain model analysis

      In this work, there are 24 mutants studied in total including the WT, and they contain mutations at 21 WT residues. We did not perform model parameter inference for the mutant G102D because of its flat induction curve (see the second subsection of section 2 and main text Figure 2—figure Supplement 3). Therefore, there are 23 mutants analyzed in main text Figure 5.

      Measuring the induction curve of a mutant involves a significant amount of experimental effort, which therefore is hard to be extended to a large number of mutants. Nonetheless, we aim to compose a set of comprehensive induction data here for validating our two-domain model for TetR allostery. To this end, we picked 15 individual mutants in the first round of induction curve measurements, which contains mutations spanning different regions in the sequence and structure of TetR (main text Figure 3—figure Supplement 1). Such broad distribution of mutations across LBD, DBD and the domain interface could potentially lead to diverse induction curve shapes and mutant phenotypes for validating the two-domain model. Indeed, as discussed in the main text section "Extensive induction curves fitting of TetR mutants", the diverse effects on induction curve from mutations perturbing different allosteric parameters predicted by the model, are successfully observed in these 15 experimental induction curves. Additionally, 5 of the 15 mutants contain a dead-rescue mutation pair, which helps us validate the model prediction that a dead mutation could be rescued by rescuing mutations that perturb the allosteric parameters in various ways.

      Eight mutation combinations were chosen for the second round of induction curve measurement for studying epistasis, where we paired up C203V and Y132A with mutations from different regions of the TetR structure. Such choice is largely based on two considerations. 1. As both C203V and Y132A greatly enhance the allosteric response of TetR, we want to probe why they cannot rescue a range of dead mutations as observed previously (PMID: 32999067). 2. C203V and Y132A are the only two mutants that show enhanced allosteric response in the first round of analysis. Combining detrimental mutations of allostery in a combined mutant could potentially lead to near flat induction curve, which is less useful for inference (see the second subsection of section 2).”

      Since the number of hotspots identified by DMS is not very large, why not analyze them all?

      We thank the reviewer for this comment. There are 41 hotspot residues in TetR (PMID: 36226916), which have 41*19=779 possible single mutations. It’s unfeasible to perform induction curve measurements for all of these 779 mutants in our current experiment. However, we agree that it would be helpful if we can obtain such a dataset in an efficient way.

      In line 257, there are 15 mutants mentioned, while in Figure 5, there are 23 mutants mentioned, in Figure 3-figure supplement 1, there are 21 mutants mentioned, and in line 226 of the supplementary file, there are 24 mutants mentioned, which is very confusing. Therefore, the data selection criteria used in this article should be given.

      We thank the reviewer for this comment. The data selection criteria are now given in section 7 of the supplementary file, which should clarify this confusion.

      (3) In Figure 4 of the Exploring epistasis between mutations section, the 6 weights of the additive models corresponding to each mutation combination are different. On one hand, it seems that there are no universal laws in these experimental data. On the other hand, unique parameters of a single mutation combination were not validated in other mutation combinations, which somewhat weakened the conclusions about the potential physical significance of these additive weights.

      We thank the reviewer for this comment. We admit that a quantitative universal law for tuning the 6 weights of the additive model does not manifest in our data, which indicates the mutation-specific nature of epistatic interactions in TetR as hinted in the different rescuing mutation distributions of different dead mutations (PMCID: PMC7568325). However, clear common trends in the weight tuning of combined mutants that contain common mutations do emerge, which comply with the structural features of the protein and provide explanations as to why C203V and Y132A don’t rescue a range of dead mutations (main text section “Exploring epistasis between mutations”). Additionally, the lack of a quantitative universal rule for tuning the 6 weights in our simple model doesn’t exclude the possibility of the existence of universal law for epistasis in TetR in another functional form, a point that could be explored in the future with more extensive joint experimental and computational investigations.

      In Eq. (27) of the supplementary file, the prior distribution of inter-domain coupling γ is given as a Gaussian distribution centered at 5 kBT. Since the absolute value of γ is important, can the authors explain why the prior distribution of γ is set to this value and what happens if other values are used?

      We thank the reviewer for the question. As explained in the corresponding discussions of Eq. (27) in the supplementary file, the prior of γ is chosen to serve as a soft constraint on its possible values based on the consideration that 1. inter-domain energetics for a TetR-like protein should be on the order of a few kBT; and 2. the prior distribution should reflect the experimental observation in the literature that γ has a small probability of adopting negative values upon mutations. Given our thorough validation of the statistical model and computational algorithm (see section 3 of the supplementary file), and the high precision in the parameter fitting results using experimental data (Figure 3 and Figure 4-figure supplement 2), we conclude that 1. the physical range of parameters encoded in their chosen prior distributions agrees well with the value reflected in the experimental data; 2. the inference results are predominantly informed by the data. Thus, changing the mean of the prior distribution of γ should not affect the inference results significantly given that it remains in the physical range.

      This point is explicitly shown in the added Table 2 (Figure 3-figure supplement 10), where we compare the current Bayesian inference results with those obtained after increasing the standard deviation of the Gaussian prior of γ from 2.5 to 5 kBT. As shown in the table, most inference results stay virtually unchanged at the use of this less informative prior, which confirms that they are predominantly informed by the data. The only exceptions are the slight increase of the inferred γ values for C203V, C203V-Y132A and C203V-G102D-L146A, reflecting the intrinsic difficulty of precise inference of large γ values with our model, as is already discussed in the second subsection of section 3 of the supplementary file. However, such observations comply with the common trend of epistatic interactions involving C203V presented in the main text and don’t compromise the ability of our model to accurately capture the induction curves of mutants. Relevant discussions are now added to the second subsection of section 3 of the supplementary file (line 368-385).

      “In our experimental dataset, such inference difficulty is only observed in the case of C203V, Y132A-C203V and C203V-G102D-L146A due to their large γ and γ + εL values (see main text Figure 3, Figure 3—figure Supplement 10 and Figure 4). As shown in main text Figure 3—figure Supplement 10, the inference results for the other 20 mutants stay highly precise and virtually unchanged after increasing the standard deviation of the Gaussian prior of γ (gstdγ ) from 2.5 to 5 kBT. This demonstrates that the inference results for these mutants are strongly informed by the induction data and there is no difficulty in the precise inference of the parameter values. On the other hand, the inferred γ values (especially the upper bound of the 95% credible region) for C203V, Y132A-C203V and C203V-G102D-L146A increased with gstdγ . This is because the induction curves in these cases are not sensitive to the value of γ given that it’s large enough as discussed above. Hence, when unphysically large γ values are permitted by the prior distribution, they could enter the posterior distribution as well. Such difficulty in the precise inference of γ values for these three mutants however, doesn’t compromise the ability of our model in accurately capturing the comprehensive set of induction data (see part iv below). Additionally, the increase of the inferred γ value of C203V at the use of larger gstdγ complies with the results presented in main text Figure 4, which show that the effect of C203V on γ tends to be compromised when combined with mutations closer to the domain interface."

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study provides potentially fundamental insight into the function and evolution of daily rhythms. The authors investigate the function of the putative core circadian clock gene Clock in the cnidarian Nematostella vectensis. While it parts still incomplete, the evidence suggests that, in contrast to mice and fruit flies, Clock in this species is important for daily rhythms under constant conditions, but not under a rhythmic light/dark cycle, suggesting that the major role of the circadian oscillator in this species could be a stabilizing function under non-rhythmic environmental conditions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this nice study, the authors set out to investigate the role of the canonical circadian gene Clock in the rhythmic biology of the basal metazoan Nematostella vectensis, a sea anemone, which might illuminate the evolution of the Clock gene functionality. To achieve their aims the team generated a Clock knockout mutant line (Clock-/- ) by CRISPR/Cas9 gene deletion and subsequent crossing. They then compared wild-type (WT) with Clock-/- animals for locomotor activity and transcriptomic changes over time in constant darkness (DD) and under light/dark cycles to establish these phenotypes under circadian control and those driven by light cycles. In addition, they used Hybridization Chain Reaction-In situ Hybridization (HCR-ISH) to demonstrate the spatial expression of Clock and a putative circadian clocl-controlled gene Myh7 in whole-mounted juvenile anemones.

      The authors demonstrate that under LD both WT and Clock-/- animals were behaviourally rhythmic but under DD the mutants lost this rhythmicity, indicating that Clock is necessary for endogenous rhythms in activity. With altered LD regimes (LD6:6) they show also that Clock is light-dependent. RNAseq comparisons of rhythmic gene expression in WT and Clock-/- animals suggest that clock KO has a profound effect on the rhythmic genome, with very little overlap in rhythmic transcripts between the two phenotypes; of the rhythmic genes in both LD and DD in WT animals (220- termed clock-controlled genes, CCGS) 85% were not rhythmic in Clock-/- animals in either light condition. In silico gene ontology (GO) analysis of CCGS reflected process associated with circadian control. Correspondingly, those genes rhythmic in KO animals under DD (here termed neoCCGs) were not rhythmic in WT, lacked upstream E-box motifs associated with circadian regulation, and did not display any GO enrichment terms. 'Core' circadian genes (as identified in previous literature) in WT and Clock-/- animals were only rhythmic under entrainment (LD) conditions whilst Clock-/- displayed altered expression profiles under LD compared to WT. Comparing CCGs with previous studies of cycling genes in Nematostellar, the authors selected a gene from 16 rhythmic transcripts. One of these, Myh7 was detectable by both RNAseq and HCR-ISH and considered a marker of the circadian clock by the authors.

      The authors claim that the study reveals insights into the evolutionary origin of circadian timing; Clock is conserved across distant groups of organisms, having a function as a positive regulator of the transcriptional translational feedback loop at the heart of daily timing, but is not a central element of the core feedback loop circadian system in this basal species. Their behavioural and transcriptomic data largely support the claims that Clock is necessary for endogenous daily activity but that the putative molecular circadian system is not self-sustained under constant darkness (this was known already for WT animals)- rather it is responsive to light cycles with altered dynamics in Clock-/- specimens in some core genes under LD. In the main, I think the authors achieved their aims and the manuscript is a solid piece of important work. The Clock-/- animal is a useful resource for examining time-keeping in a basal metazoan.

      The work described builds on other transcriptomic-based works on cnidaria, including Nematostellar, and does probe into the molecular underpinnings with a loss-of-function in a gene known to be core in other circadian systems. The field of chronobiology will benefit from the evolutionary aspect of this work and the fact that it highlights the necessity to study a range of non-model species to get a fuller picture of timing systems to better appreciate the development and diversity of clocks.

      Strengths:

      The generation of a line of Clock mutant Nematostellar is a very useful tool for the chronobiological community and coupled with a growing suite of tools in this species will be an asset. The experiments seem mostly well conceived and executed (NB see 'weaknesses'). The problem tackled is an interesting one and should be an important contribution to the field.

      Weaknesses:

      I think the claims about shedding light on the evolutionary origin of circadian time maintenance are a little bold. I agree that the data do point to an alternative role for Clock in this animal in light responsiveness, but this doesn't illuminate the evolution of time-keeping more broadly in my view. In addition, these are transcriptomic data and so should be caveated- they only demonstrate the expression of genes and not physiology beyond that. The time-course analysis is weakened by its low resolution, particularly for the RAIN algorithm when 4-hour intervals constrain the analysis. I accept that only 24h rhythms were selected in the analysis from this but, it might be that detail was lost - I think a preferred option would be 2 or 3-hour resolution or 2 full 24h cycles of analysis.

      The authors discount the possibility of the observed 12h rhythmicity in Clock-/- animals by exposing them to LD6:6 cycles before free-running them in DD. I suggest that LD cycles are not a particularly robust way to entrain tidal animals as far as we know. Recent papers show inundation/mechanical agitation are more reliable cues (Kwiatkowski ER, et al. Curr Biol. 2023, 2;33(10):1867-1882.e5. doi: 10.1016/j.cub.2023.03.015; Zhang L., et al Curr Biol. 2013, 23;19, 1863-1873 doi.org/10.1016/j.cub.2013.08.038.) and might be more effective in revealing endogenous 12h rhythms in the absence of 24h cues.

      Response: We removed the suggestion that we used 6:6h LD to perform tidal entrainment. We generated this ultradian light condition to address the 24h rhythmicity observed in the NvClk1-/- in 12:12h LD.

      Reviewer #2 (Public Review):

      This manuscript addresses an important question: what is the role of the gene Clock in the control of circadian rhythms in a very primitive group of animals: Cnidaria. Clock has been found to be essential for circadian rhythms in several animals, but its function outside of Bilaterian animals is unknown. The authors successfully generated a severe loss-of-function mutant in Nematostella. This is an important achievement that should help in understanding the early evolution of circadian clocks. Unfortunately, this study currently suffers from several important weaknesses. In particular, the authors do not present their work in a clear fashion, neither for a general audience nor for more expert readers, and there is a lack of attention to detail. There are also important methodological issues that weaken the study, and I have questions about the robustness of the data and their analysis. I am hoping that the authors will be able to address my concerns, as this work should prove important for the chronobiology field and beyond. I have highlighted below the most important issues, but the manuscript needs editing throughout to be accessible to a broad audience, and referencing could be improved.

      Major issues:

      (1) Why do the authors make the claim in the abstract that CLOCK function is conserved with other animals when their data suggest that it is not essential for circadian rhythms? dCLK is strictly required in Drosophila for circadian rhythms. In mammals, there are two paralogs, CLOCK and NPAS2, but without them, there are no circadian rhythms either. Note also that the recent claim of BMAL1-independent rhythms in mammals by Ray et al., quoted in the discussion to support the idea that rhythms can be observed in the absence of the positive elements of the circadian core clock, had to be corrected substantially, and its main conclusions have been disputed by both Abruzzi et al. and Ness-Cohn et al. This should be mentioned.

      Response: According to our Behavioral and Transcriptomic data, CLOCK function is conserved in constant light condition. In LD context, the rhythmicity is maintained probably by the light-response pathway in Nematostella. We modified our rhythmic transcriptomic analysis and considered the context of the contested results by Ray et al., and discussed it in the revised manuscript.

      (2) The discussion of CIPC on line 222 is hard to follow as well. How does mRNA rhythm inform the function of CIPC, and why would it function as a "dampening factor"? Given that it is "the only core clock member included in the Clock-dependent CCGs," (220) more discussion seems warranted. Discussing work done on this protein in mammals and flies might provide more insight.

      Response: The initial sentence was unclear. Furthermore, since we restricted our rhythmic analysis to genes only found rhythmic with a p<0.01 with RAIN combined with JTK, NvCipc was no longer defined as rhythmic in free running.

      (3) The behavioral arrhythmicity seen with their Clock mutation is really interesting. However, what is shown is only an averaged behavior trace and a single periodogram for the entire population. This leaves open the possibility that individual animals are poorly synchronized with each other, rather than arrhythmic. I also note that in DD there seem to be some residual rhythms, though they do not reach significance. Thus, it is also possible that at least some individual animals retain weak rhythms. The authors should analyze behavioral rhythms in individual animals to determine whether behavioral rhythmicity is really lost. This is important for the solidity of their main conclusions.

      Response: Fig. 1 has been modified. We have separated the data for WT and NvClk1-/- animals to provide clarity on the average behavior pattern for each genotype. While the LSP analysis on the population average informs us about the synchronization of the population, it is true that it does not provide insight into individual rhythmicity. To address this, we analyzed individuals in all conditions using the Discorhythm website (Carlucci et al., 2019).

      In the revised figure, we have included a comparison plot of the acrophase of 24-hour rhythmic animals between genotypes using Cosinor analysis, which is most suitable for acrophase detection. This plot indicates the number of animals detected as significantly rhythmic, providing direct visual input to the reader regarding individual rhythmicity. Additionally, we have added Table 1, which contains the Cosinor period analysis (24 and 12 hours) of individuals for all genotypes and conditions, further enhancing the clarity of our findings.

      (4) There is no mention in the results section of the behavior of heterozygotes. Based on supplement figure 2A, there is a clear reduction in amplitude in the heterozygous animals. Perhaps this might be because there is only half a dose of Clock, but perhaps this could be because of a dominant-negative activity of the truncated protein. There is no direct functional evidence to support the claim that the mutant allele is nonfunctional, so it is important to discuss carefully studies in other species that would support this claim, and the heterozygous behavior since it raises the possibility that the mutant allele acts as a dominant negative.

      Response: Extended Data Fig.1 modified. We show NvClk1+/- normalized locomotion over time in DD of the population, comparison of individual normalized behavior amplitude, LSP of the average population and individual acrophase of only rhythmic 24h individuals. Indeed, we cannot discriminate Dominant-negative from non-functional allele.

      (5) I do not understand what the bar graphs in Figure 2E and 3B represent - what does the y-axis label refer to?

      Response: Not relevant to the revised manuscript.

      (6a) I note that RAIN was used, with a p<0.05 cut-off. I believe RAIN is quite generous in calling genes rhythmic, and the p-value cut-off is also quite high. What happens if the stringency is increased, for example with a p<0.01.

      Response: We acknowledge your concern regarding the stringency of our statistical analysis. To address this, we opted to combine both RAIN and JTK methods and applied a more stringent p-value cut-off of p<0.01.

      (6b) It would be worth choosing a few genes called rhythmic in different conditions (mutant or wild-type. LD or DD), and using qPCR to validate the RNAseq results. For example, in Figure 3D, Myh7 RNAseq data are shown, and they do not look convincing. I am surprised this would be called a circadian rhythm. In wild-type, the curve seems arrhythmic to me, with three peaks, and a rather large difference between the first and second ZT0 time point. In the Clock mutants, rhythms seem to have a 12hr period, so they should not be called rhythmic according to the material and methods, which says that only ca 24hr period mRNA rhythms were considered rhythmic. Also, the result section does not say anything about Myh7 rhythms. What do they tell us? Why were they presented at all?

      Response: Regarding the suggestion for independent verification of our RNAseq results, we agree that such validation would enhance the robustness of our findings. To address this, we chose to overlap our identified rhythmic genes under WT LD conditions with those from another transcriptomic study that shared similarities in experimental design. Notably, the majority of overlapping rhythmic genes between the studies are candidate pacemaker genes. We believe that this replication of biologically significant rhythmic genes strengthens the validity and reliability of our results (see Extended Data Fig. 2).

      Furthermore, we have decided to remove the NvMhc-st (mistakenly named Myh7, only rhythmic in WT DD in the new analysis) as it does not contribute substantively to the revised version of the manuscript.

      (7) The authors should explain better why only the genes that are both rhythmic in LD and DD are considered to be clock-controlled genes (CCGs). In theory, any gene rhythmic in DD could be a CCG. However, Leach and Reitzel actually found that most genes in DD1 do not cycle the next day (DD2)? This suggests that most "rhythmic" genes might show a transient change in expression due to prolonged obscurity and/or the stress induced by the absence of a light-dark cycle, rather than being clock controlled. Is this why the authors saw genes rhythmic under both LD and DD as actual CCGs? I would suggest verifying that in DD the phase of the oscillation for each CCG is similar to that in LD. If a gene is just responding to obscurity, it might show an elevated expression at the end of the dark period of LD, and then a high level in the first hours of DD. Such an expression pattern would be very unlikely to be controlled by the circadian clock.

      Response: As we modified our transcriptomic analysis, we do no longer analyze LD+DD rhythmic genes, but any genes rhythmic (RAIN and JTK p<0.01) in each condition. As such we end up with four list of genes corresponding to each experimental conditions.

      (8) Since there are still rhythms in LD in Clock mutants, I wonder whether there is a paralog that could be taking Clock's place, similar to NPAS2 in mammals.

      Response: see response to (1) > The only NPAS2 orthologous identified in Nematostella NPAS3 showed marginally significance (p=0.013) with RAIN in LD WT suggesting a regulation similar to the candidate pacemaker genes. As such we included within our candidate pacemaker genes list.

      (9) I do not follow the point the authors try to make in lines 268-272. The absence of anticipatory behavior in Drosophila Clk mutants results from disruption of the circadian molecular clock, due to the loss of Clk's circadian function. Which light-dependent function of Clock are the authors referring to, then? Also, following this, it should be kept in mind that clock mutant mice have a weakened oscillator. The effect on entrainment is secondary to the weakening of the oscillator, rather than a direct effect on the light input pathway (weaker oscillators have increased response to environmental inputs). The authors thus need to more clearly explain why they think there is a conservation of circadian and photic clock function.

      Response: Following the changes in our statistical analysis we reframed the discussion and address directly the circadian and the photic clock function (we call it light-response pathway in the manuscript)

      Recommendations for the authors:

      We suggest the following improvements:

      (1) Please undertake a serious effort to make this work more accessible to non-marine chronobiologists. This includes better explanations, and schemes of the animal when images of staining are shown (e.g. Fig.1b) which include the labeling of relevant morphological structures mentioned in the text (like "tentacle endodermis and mesenteries" (line 132)). Similar issues for mentioned life cycle stages like "late planula stage" (line 133), "bisected physa" (line 149).

      Response: Fig. 1b, we outlined the animal shaped and added 2 arrows to locate the tentacle endodermis and mesenteries. We replaced the term late planula stage, by larvae. And we rephrased bisected physa by tissue sampling.

      Please attend to details. This includes:

      • Wrong referrals to figures (currently line 151 refers to EDF2- but should be EDF 1 instead, there is a Fig.3f mentioned in the text, but there is no such Fig.).

      Response: Fixed

      • Mentioning of ZTs when the HCR stainings were performed.

      Response: Fixed

      • Fig.1 a shows a rather incomplete and thus potentially confusing phylogenetic tree. Vertebrates have at least two Clk orthologs (NPAS2 and CLK), please include both, use an outgroup, and rout the tree.

      Response: Identifying NPAS2 and CLK orthologous in all species added more confusion into the conclusion. However, we followed the suggestion of adding an outgroup using a CLK orthologous sequence identified in the sponge Amphimedon queenslandica and rout the tree. Thank for the suggestion.

      • What do the y-axis labels in Figure 2E and 3B refer to exactly? Y-axis label annotations in Fig.3a,d are entirely missing- what do the numbers refer to?

      Response: not relevant in the revised manuscript

      • Fig.2D- is the Go term enrichment referring to LD or DD?

      Response: to DD. We made it cleared on the figure 5.

      • Wording: "Clock regulates genetic pathways." What is meant by "genetic pathways"? There are no "non-genetic pathways". Could one simply say: "Clock regulates a variety of transcripts".

      Response: We modified our threshold to use only p.adj<0.01, which reduced the GO term numbers. We removed “genetic pathways” and now address the specific pathways: cell-cycle and neuronal.

      The use of the term "epistatic" is confusing (line 219), i.e. that light is epistatic to Clock. In genetics, epistasis is defined as the effect of gene interactions on phenotypes. To a geneticist, this implies that there is a second gene impacting on the phenotype of the Clock mutants. Please re-word.

      Response: “light is epistatic on Clock” has been re-phrased.

      The provided Supplementary tables are not well annotated. Several of them need guess-work about what is shown. For instance, for Supplementary Table 1, the Ns are unclear, which in total can go up to almost 200 per condition-genotype, but only about 30 animals for each were tested. Thus, where do the high totals in the LSP table come from? What do the numbers of each periodicity mean? Initially one might assume it was the number of animals that showed a periodogram peak at a given periodicity, but it seems that cannot be. Maybe it counted any period bin over statistical significance? Please clarify with better descriptions and labels.

      Response: Supplementary tables are now clearly annotated on their first Tabs. About Fig.1, we already addressed this point in the public review.

      Albeit not essential, it would be more reader-friendly to also add a summary table with average period and SD, power and SD, and percentage rhythmicity to the main figure.

      Response: Table 1 is added: it contains individual count of rhythmic animals (24h and 12h) with Cosinor. However, using Discorhythm we had to ask for a specific Period. Thus, we can only provide animal count significant for a given period value. And not an estimation of their own period.

      (2) Some of the terminology is quite confusing, in particular the double meaning of the word "clock" (i.e the pacemaker and the transcription factor). This is not a specific problem to this manuscript, but it would be helpful for the readability to try to improve this.

      Could the gene/transcript/protein be spelled: clk and Clk?

      Alternatively, for clarity- how about talking about "core pacemaker genes," "CLOCK-dependent rhythmic genes" and "CLOCK-independent rhythmic genes"?

      Response:

      Clock/CLOCK > NvClk / NvCLK and the mutant is NvClk1-/-

      Core clock genes > candidate pacemaker genes.

      CLOCK-dependent CCG > this notion no longer exists in the revised manuscript.

      CLOCK-independent CCG > this notion no longer exists in the revised manuscript.

      (3) The dismissal of the 12h rhythmicity in Clock-/- animals is not really convincing and should be reconsidered. LD6:6 cycles (before free-running animals in DD) is likely a not particularly robust way to entrain tidal animals. Recent papers show inundation/mechanical agitation are more reliable cues (Kwiatkowski ER, et al. Curr Biol. 2023, 2;33(10):1867-1882.e5. doi: 10.1016/j.cub.2023.03.015; Zhang L., et al Curr Biol. 2013, 23;19, 1863-1873 doi.org/10.1016/j.cub.2013.08.038.) and might be more effective in revealing endogenous 12h rhythms in the absence of 24h cues.

      Response: We removed the proposition of using 6:6hLD as Tidal entrainment. Instead, the LD 6:6 experiment reveals the direct light-dependency of the NvClk1-/- mutant.

      (4) There are significant questions raised on the validity of BMAL1-independent rhythms in mammals as suggested by the Ray et al study. See DOI: 10.1126/science.abe9230 and DOI: 10.1126/science.abf0922

      These technical comments should also be taken into account and the discussion adjusted accordingly to better reflect the ongoing discussions in the chronobiology field.

      Response: We modified our rhythmic analysis. As we cannot use BHQ or adjusted p-value which resulted in very genes, we defined 24h-rhythmic genes if p<0.01 with two different algorithms (RAIN and JTK). We propose this compromise to reduce the risk of false-positive. Furthermore, we discussed our methodology in the light of the significant questions raised by these papers you cited. We thank the reviewer for this important point.

      (5) The HCR stainings for clk are not very convincing. Normally, HCR should have more dots. In principle, the logic of HCR is such that it detects individual mRNA molecules in the cell. Thus, having only one strong dot/cell like in Fig.1b doesn't make much sense.

      Response: We were the first surprised by this single dot signal. We are experienced users of HCRv.3 across different species. We decided to remove the close-up (for further investigations) but to keep the full animal signal. According to our approach it is a convincing signal. However, the doty nature of the signal itself it is not easy to make it highly visible at full scale animal on the picture. We did our best to show the mRNA signal visible without altering the pattern.

      Furthermore, the controls for the HCR in situ hybridization are unclear. In the methods, there are two Clock probes described (B3 & B5) and two control probes (B1 & B3), however, in the negative control image, a combination of one Clock (B1) and one control (B3) probes is used and is unclear what "redundant detection" means in the legend of figure S2.

      Response: Considering the nature of the signal (single of few dots), we decided to use two probes with 2 different fluorophores. A noise is by nature random. Our hypothesis was: only overlapping fluorescent dots are true signal of NvClk mRNA.

      For Control probes we used two zebrafish probes labelling hypothalamic peptides.

      Based on the experience with non-Drosophila, non-mouse animal model systems the reviewers assume that non-sense mediated mRNA decay (NMD) is not strongly initiated upon Crispr-induced premature STOP-codons. If this assumption is correct it would be worth to mention it. Alternatively, it would be worth testing if Nematostella induces NMD, as this would be a great control for the HCR and the mutation itself. At which ZT was the HCR done?

      Response: We performed the HCR at ZT10 when NvClk is described to be at peak. It is now indicated in the Fig. 1b. The RNAseq detected a higher quantity of NvClk1 mRNA in the NvClk1-/- (see Fig. 4a). mRNA quantity regulation involves transcription, stabilization, and degradation. At this stage, we cannot identify which specific step is affected.

      For Fig.1c- please provide the binding site and sequence in the figure, simply include EDF 1 in the main figure.

      Response: We generated a clear indication in the new Fig.1c and EDF. 1b about the protein domains, the CRISPR binding site and the consequences on the DNA and AA sequences.

      (6) Please provide the individual trace data for the behavioral analyses either as supplementary files or as a link to an openly accessible database like DRYAD (see also comment 7 in the public review of reviewer 2). Maybe this is what is shown in Supplementary Table 1, but it is really not clear what is actually shown.

      Response: Fig.1 is updated. Table 1 is added. Supplementary Table 1 contains individual normalized locomotor data of each polyps for each genotypes and light conditions. Supplementary Table 2 contains the cosinor individual rhythmic behavior analysis based on the Supplementary Table 1.

      (7) It is not really clear if the mutation is a true loss-of-function or could also be dominant negative. While this is raised in the discussion, it should be more carefully considered. The reason why a dominant negative would be unlikely is unclear. More specifically also see comment 8) in the public review of reviewer 2.

      Response: Indeed, the results cannot tell us if it is a true loss of function, a dominant negative or non-functional allele. We addressed it in the first part of the discussion.

      (8) The pretty small overlap of rhythmic transcripts in LD and DD could reflect the true biology of a more core clock driven-process under constant conditions and a more light-driven process under LD. But still- wouldn't one expect that similar processes should be rhythmic? If not, why not?

      It would certainly add strength to the data if for one or two transcripts these results were independently verified by qPCR from an independent sampling. This could even be done for just two time points with the most extreme differences.

      Response: We appreciate the reviewer's comments and concerns regarding the overlap of rhythmic transcripts in different conditions. In response to the reviewer's query, we revised our interpretation of the transcriptomic data, acknowledging the limited overlap between light and genotype conditions in our study. This prompted us to reconsider the underlying biological processes driving rhythmic gene expression under constant conditions versus light-dark cycles.

      Regarding the suggestion for independent verification of our RNAseq results, we agree that such validation would enhance the robustness of our findings. To address this, we chose to overlap our identified rhythmic genes under WT LD conditions with those from another transcriptomic study that shared similarities in experimental design. Notably, the majority of overlapping rhythmic genes between the studies are candidate pacemaker genes. We believe that this replication of biologically significant rhythmic genes strengthens the validity and reliability of our results (see Extended Data Fig. 2).

      (9) Expression of myh7 : Checking for co-expression should be pretty straightforward by HCR. This is what this type of staining technique is really good for. Please do clk and myh7 co-staining if you want to claim co-expression. Otherwise don't make such a claim.

      Response: We agree that checking for co-expression should be straightforward by HCR. However, due to time constraints during the revision period, we are unable to conduct the double in-situ experiment. Additionally, upon careful consideration, we recognize that including myhc-st (mistakenly named myh7) staining and co-expression analysis would not significantly contribute to the main conclusions of our study. Therefore, we have decided to remove this analysis from the revised manuscript.

      (10) Missing methodological details:

      • The false discovery rate for each analysis should be included (see Hughes et al.,: "Guidelines for Genome-Scale Analysis of Biological Rhythms," 2017).

      Response: THE FDR is indicated for each gene in supplementary table 3

      • Fig.1f- continuous light- please provide a spectrum (If there is no good spectrophotometer available, please provide at least manufacturer information.

      Response: Unfortunately, we don’t have a good spectrophotometer available during the time of the revision. We added to the method the reference of the lamp. We found the light spectrum provided by the supplier. However, we did not add it to the revised manuscript.

      Author response image 1.

      Spectrum of the Aquastar t8

      Also, it would be easier for the reader, if the measurements of light intensity are provided in photons, because this is what the light receptors ultimately measure.

      Response: Modified.

      • Fig.2E- please add the consensus sequence used for circadian E-box vs. E-box to the figure.

      Response: In the revised manuscript Fig.4c, we show which E-box motifs we extracted for our promoter analysis. We as well changed our analysis and did no longer use HOMER, but we directly extracted promoter sequences and looked for canonical Ebox CANNTG and Circadian Ebox CACGTG and generate a Circadian Ebox enrichment output per gene promoter.

      (11) There has been some discussion about the evolutionary statement as stated by the authors. It appears that depending on the background of the reader, this can be misunderstood. We thus suggest to more clearly point out where the author thinks there is evolutionary conservation (a function for clk in the circadian oscillator under constant light or dark conditions) versus where there is no apparent evolutionary conservation (the situation under light-dark conditions).

      Response: In the revised manuscript we proposed a conserved function of NvCLK in constant darkness, and a light-response pathway compensating in LD conditions in the mutant.

      Please also consider the major comments 8 and 9 of the common review from reviewer 2.

      Reviewer #1 (Recommendations For The Authors):

      The hybridization chain-reaction ISH is OK but, I'm not sure I understand the control condition-this should be clarified. I would also welcome the use of Clock-/- animals in HCR as another, more direct level of control. In addition, the authors state that the Myh7 probes hybridise in anatomical regions resembling those for Clock (Fig 3e). It would be better to duplex these two probe sets with different fluors for a better representation of the relative spatial distributions of each transcript.

      Response: We agree that checking for co-expression should be straightforward by HCR. However, due to time constraints during the revision period, we are unable to conduct the double in-situ experiment. Additionally, upon careful consideration, we recognize that including myhc-st (mistakenly named myh7) staining and co-expression analysis would not significantly contribute to the main conclusions of our study. Therefore, we have decided to remove this analysis from the revised manuscript.

      We clarified in the methods the control probes design.

      Minor points:

      Figure legends do not all convey sufficient detail. For instance, Figure 1c needs a better explanation. Figure 3e- are these images both WT? Fig 3f doesn't exist and other figure text references do not align with figures and need an overhaul.

      Response: All errors have been fixed.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      (1) The authors need to introduce their model system better for a broad audience. What are the tissues/cells that express Clock at a higher level? What is their function, does this provide a potential explanation for their specific Clock expression, and how CLOCK might regulate behavior? Terms such as "tentacle endodermis and mesenteries" (line 132), "late planula stage" (line 133), "bisected physa" (line 149) would need some explanation.

      Response: We modified term such as planula to larvae, and bisected physa to tissue samples.

      2) Some of the terminology used is quite confusing, because of the double-meaning of the word "clock" (i.e the pacemaker and the transcription factor). The authors use terms such as "clock-controlled genes", "core clock genes", "CLOCK-dependent clock-controlled genes", "neo-clock-controlled genes". Is there any way to help the reader? Here are several suggestions: "core pacemaker genes," "CLOCK-dependent rhythmic genes" and "CLOCK-independent rhythmic genes".

      Response: all the terminology has been clarified, see previous comments

      3) Also in the abstract, there is mention of "hierarchal light- and Clock-signaling" (52-3) - is this related to the statement on line 219 that light is epistatic to Clock? I do not quite understand what epistatic would mean here. Who is upstream of whom? LD modifies rhythmicity in Clock mutant animals, but Clock mutations also impact rhythmicity in LD. Also, as epistasis is defined as the effect of gene interactions on phenotypes - what is the secondary gene impacting the phenotype of the Clock mutants? I am not sure the term epistatic is appropriate in the present context.

      Response: Indeed, Epistatic is a genetic term which might be unclear in this context. We removed it.

      4) The control for the in situ hybridization is unclear. In the methods, there are two Clock probes described (B3 & B5) and two control probes (B1 & B3), however, in the negative control image, a combination of one Clock (B1) and one control (B3) probe is used, I am not sure what "redundant detection" means in the legend of figure S2. Also, the sequences of each Clock probe should be provided. It might be worth testing the Clock mutant the authors generated. Clock mRNA could be reduced due to non-sense, mediated RNA decay, since the mutation causes a premature stop codon. This would be a great additional control for the in situ hybridization. Even better would be if, by chance, the probes target the mutated sequence. The signal should then be completely lost.

      Response: HCR is a tilling probe. Which means the target transcript is covered by dozens of successive DNA sequence “primer-like” which allow the HCRv.3 technology. We cannot design a mutant probe specific with this technology.

      (5) I have concerns with rhythmic-expression calls, particularly as there is so little overlap between LD and DD, and that a completely different set of rhythmic genes is observed in Clock mutant and wild-type animals. I am not an expert in whole-genome expression studies, so I hope one of my colleague reviewers can weigh in.

      When describing rhythmicity analysis in the Methods, it states that Benjamini-Hochberg corrections were applied to account for multiple comparisons. However, the false discovery rate for each analysis should be included (see Hughes et al.,: "Guidelines for Genome-Scale Analysis of Biological Rhythms," 2017).

      Response: As explained before we cannot used Benjamini-Hochberg corrections as only few genes (mostly oscillator gene pass the threshold). As such we combined two different algorithms (RAIN and JTK) with a p<0.01 to detect confidently rhythmic genes while reducing the risk of false-positives.

      Minor issues:

      (1) Environmental inputs are not "circadian", as written in the title.

      Response: Title modified

      (2) In the abstract, the description of the Clock mutant behavioral phenotypes is hard to follow, with no mention of whether or not Clock mutant animals are behaviorally rhythmic or arrhythmic in constant conditions.

      Response: corrected

      (3) Abstract: A 6/6 h LD cycle is not a compressed tidal cycle as written in the abstract. Light is not an input to tidal rhythms.

      Response: corrected

      (4) Line 101: timeout is not a core clock gene in animals.

      Response: we removed it from the candidate pacemaker genes.

      (5) What is the evidence for the role of PAR-Zip proteins in the Nematostella clock? The reference provided does not mention those.

      Response: There is no functional data in Nematostella yet to support their role within the pacemaker. However based on their rhythmicity in LD and protein conservation, we included them within the candidate pacemaker genes list. The refences have been corrected.

      (6) Line 125. should refer to Fig 1C when describing the Clock protein.

      Response: corrected

      (7) Line 143-4. based on the figure, the region targeted by gRNA was not "close to the 5' end" as stated, it is closer to the middle of the gene sequence as shown in Figure 1C. A more accurate description would be a region in between the PAS domains.

      Response: Indeed we modified the figure and the text.

      (8) Line 150. The mutant allele is described as Clock1 initially, then for the rest of the paper as Clock-. SInce it is not clear that the allele is a null (see major comment #8), Clock1 should be used throughout the manuscript.

      Response: the allele is named NvClk1 in the revised manuscript

      (9) Figure 2A, the second CT/ZT0 is misplaced.

      Response: Fig. 2 modified in the revised manuscript

      (10) Figure legend for 2E and 3B. "The 1000bp upstream ATG" is unclear. I guess it means that 1000bp upstream of the putative initiation codon was used.

      Response: Right, and in the revised version we analyzed 5kb upstream the putative ATG.

      (11) Line 164. The authors write "We discovered..." , but wasn't it already known that these animals are behaviorally rhythmic?

      Response: Fixed

      (12) It would be worth mentioning in the results section the reduced amplitude of rhythms in LL compared to DD (in WT and seemingly also in Clock mutants).

      Response: Indeed, we observed a significant reduction in the mean amplitude in the NvClk1-/- in DD and LL compared WT and NvClk1-/- in LD, DD and LL. However, as rhythmicity is lost by virtually all mutants in LL and DD we do not think these results add to the current interpretation of the gene function.

      (13) Please correct the figure numbers in the main text, there are several mistakes.

      Response: Done

      (14) Line 196, most genes in the quoted study did not cycle on day 2, so whether they are truly clock controlled is questionable.

      Response: We agree, identifying free-running cycling genes in cnidarian remains a challenge to overcome. One of the limitations of this study was to detect rhythmic genes in LD which conserved rhythmicity in DD. However, considering different transcriptomic studies (cited in the discussion) it seems that in the cnidaria phyla rhythmic genes in LD are not necessarily the one we identified rhythmic in DD.

      (15) Line 204-206 needs to be rephrased. It is confusing.

      Response: rephrased

      (16) Line 216. Rephrase to something like: "A similar finding was made for."

      Response: rephrased

      (17) "Clock regulates genetic pathways" sounds quite odd. Do you mean it regulates preferentially specific genetic (or maybe better, molecular) pathways?

      Response: rephrased

      (18) Figure 4 and legend: Dashed lines indicating threshold are missing. Do the black and red dots represent WT and Clock-/-, as indicated in the legend, or up/down, as indicated in the figures?

      Response: Fig.5 modified accordingly. Colors in the Volcano plot indicate Up- (black) versus Down- (red) regulated. It is now coherent within the figure.

      (19) Legend for Extended figure 1. "Immature peptide sequence" is incorrect.

      Response: rephrased

      (20) Extended data Figure 4. What the asterisks labels is unclear.

      Response: EDF4 was modified and become EDF2 with different content. The * indicates NvClk mRNA

      (21) Line 228. Gene "isoforms". I guess the authors mean "paralogs".

      Response: corrected.

      (22) Line 232-3/Figure 3e. Please include a comparable image of the Clk ISH to facilitate the comparison of the spatial expression pattern. In addition, where and what is the "analysis" referred to - "the spatial expression pattern of Myh7 closely resembled that of Clock, as evidenced by our analysis"?

      Response: the analysis has been removed from the revised manuscript because we currently cannot perform the double ish.

      (23) Line 282-3. As mentioned above, it is difficult to be sure that circadian behavior is lost, if only looking at a population of animals.

      Response: Fig.1 corrected

      (24) Line 301-5. Rephrase.

      Response: Rephrased

      (25) Line 325. I am not convinced that the author can say that their mutant is amorphic. See Major comment 8.

      Response: corrected.

      (26) Line 351 "simplifying interactions with the environment". Please explain what is meant here.

      Response: this confusing sentence has been removed from the revised manuscript

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figures 1B, S4, and S5, Tibia sections would be more informative and promising as the growth plate is flat. Otherwise, histology of the knee would be preferred.

      We have added the tibia section images in Figures 1B, S4, and S5 (New Figure 1B, Figure 2-figure supplement 3A, and Figure 3-figure supplement 1A).

      (2) Figure 1C, The authors performed immunostaining for vimentin, alpha-SMA, Col1a1 and Col1a2. The authors should use adjusted sections for the immunostaining for different antibodies. It would avoid region-specific variations in the size and shape of sections and the data would be more reliable. Please correct and revise.

      We have provided immunostaining results using consecutive sections at the similar locations of the external ear (Figure 1C).

      (3) Figure 2A and throughout the manuscript where authors performed p-smad1/5/9 fluorescent immunostaining, the authors should also show non-phospho levels of p-smad1/5/9. Please correct and revise.

      We have tried different anti-Smad1/5/9 antibodies and the signals have very high background and are not presentable. We instead did a western blot on auricle samples and the results are in Figure 2-figure supplement 1A, suggesting that ablation of Bmpr1a led to loss of activation of Smad1/5/9 without affecting their expression. For different segments of external ear, we also provided WB results in Figure 2-figure supplement 4B. In addition, we added RNA-seq data regarding the Smad1,5,9 mRNA levels, which were not affected by Bmpr1a ablation (Figure 4-figure supplement 1B). Overall, these results suggest that Bmpr1a ablation does not affect the expression of Smad1/5/9.

      (4) Result 2, lines 131-134, the authors mentioned in the text that they observed no ear phenotype of Prrx1CreERT or Bmpr1af/f mice compared with wild-type mice (Figures S2A and S2B). However, the figures did not show histology pictures of wild-type mice. Please correct and revise.

      We have provided histological pictures of wild type mice (Figure 2-figure supplement 2C).

      (5) Result 5, lines 173-174 "We generated....Bmpr1a floxed mice". How did authors generate Col1a2-CreERT; Bmpr1af/f mice by crossing Prrx1Cre-ERT and Bmpr1af/f mice? Please correct and revise.

      It is a typo and has been corrected.

      (6) In the previous study by Soma Biswas et al., (Scientific Reports 2018, PMID 29855498) the authors mentioned in the result section that the mice with deletion of Bmpr1a using Prx1Cre looked morphologically normal. They did not mention the ear phenotype/microtia. Please explain how this study differs from current work and what are the limitations in the discussion.

      We did not observe an obvious ear phenotype in the adult transgenic Prrx1-CreERT; Bmpr1af/f mice. The reason could be that that the transgene label too few auricle chondrocytes as it has been for endosteal bones and periosteal bones in adult mice (Liu et al. Nat Genet 2022; Wilk, K. et al. Stem Cell Rep 2017; Julien A et al. J Bone Miner Res 2022). The difference is likely caused by the fact that the transgenic CreERT line was driven by a 2.3 kilobase promoter of Prrx1 that was inserted to unknow location in the genome. Since we do not carry the transgenic line any more, we cannot directly test the labelling efficiency of the transgenic line in auricle. We have discussed this point in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Chondrocytes are present in many parts of the body; some components are replaced by osteoblast cells, but others stay with their morphology. These cells are in different morphological and cellular conditions throughout the body. Is there any human variant study of Prrx1 and their association with auricle chondrocytes is present?

      We searched the literature and found no study on Prrx1 in auricle chondrocytes in human.

      Do auricle chondrocytes have Prrx1+ through their developmental stage, and what's the expression situation of Prrx1+ at articular cartilage and growth plates throughout development? Only a small population is positive throughout the development, or they lose as they develop.

      We traced Prrx1 lineage cells in Prrx1-CreERT; R26tdTomato mice that received TAM at E8.5, E13.5, or p21. We found that auricle chondrocytes were Tomato+ under these conditions even only one dose of TAM (1/10 of the dose for adult mice) was given to the pregnant mice at E8.5 or E13.5 (Figure 1-figure supplement 1). However, while E8.5 mice showed Tomato+ chondrocytes at both articular cartilage and growth plate, E13.5 or p21 mice showed much fewer Tomato+ chondrocytes at articular cartilage and growth plate (Figure 1-figure supplement 1). These results indicate that Prrx1 expression differs in cartilages during development, growth, and maintenance.

      What's your rationale for studying Bmpr1a ablation at the adult stage?

      Organ development and maintenance are different processes, especially for slow-turnover tissues. Organ maintenance is also important since it accounts for 90% of the lifetime of mice. While previous studies have uncovered essential roles for BMP signaling in chondrogenic differentiation during development, it remains unclear whether BMP signaling plays a role in cartilage maintenance in adult mice.

      Line no 128: Chondrocytes are shirked but still have normal proliferation; what's the author's thought about it?

      Sorry that we did not make it clear enough. Actually there were very few cells undergoing proliferation in auricle cartilage and Bmpr1a ablation did not alter that. We have rephrased these sentences.

      Do chondrocytes have protein trafficking defects or ER/Golgi stress?

      We checked the expression of proteins involved in protein trafficking and found that some were up-regulated and some were down-regulated (Figure 4-figure supplement 1D), which may reflect the shift from chondrocytes to osteoblasts and warrants further investigation. However, the expression of ER or Golgi stress-related genes, which play critical roles in chondrocyte differentiation and survival (Wang et al. 2018; Horigome et al. 2020), was not altered by Bmpr1a ablation (Figure 4-figure supplement 1E and 1F).

      How many Prrx paralogs are there in the system? Are all associated with auricle chondrocytes and similar mechanisms?

      There is one Prrx1 paralog, Prrx2. While Prrx1-/- mice lived for up to 24 hours after birth with low-set ears (Martin JF. Eta al. Genes Dev. 1995), Prrx2-/- mice are perfectly normal. Prx1-/-Prx2-/- double mutant mice died within an hour after birth and the pups showed no external ears (ten Berge D. et al. Development. 1998). We have added this information into the revised manuscript.

      Extracellular matrix (ECM) provides cell-to-cell interaction and environment for cell growth. Does Bmpr1a ablation lead to any changes in ECM at the auricle or growth plate chondrocytes?

      Our analysis showed that the expression of many ECM proteins was down-regulated in auricle cartilage of Prrx1-CreERT; Bmpr1af/f mice (Figure 4-figure supplement 1A). This may reflect the shift from chondrocytes to osteoblasts and warrants further investigation. However, immunostaining revealed that the expression of Aggrecan and Col10 in the growth plates was unaltered in adult Prrx1-CreERT; Bmpr1af/f mice compared to control mice (Figure 4-figure supplement 1C), likely due to the lack of marking of chondrocytes in growth plates.

      Microtia usually develops during the first trimester of pregnancy in humans. What's your view about studying at the adult stage compared to intrauterine development?

      Congenital microtia is a problem with the formation of external ear whereas microtia development in adult mice is a problem with the maintenance of the auricle chondrocytes. Organ maintenance is also an important process as it starts from 3 months of age and lasts for 90% of the lifetime of mice.

      In RNA sequencing protocol, Wikipedia pages keep updating, so it is very strange to cite the Wikipedia pages. Cite a research article for it.

      We have replaced this reference.

      Why do the authors have a very low FDR value for this study? How does this value strengthen the study?

      It was a typo that has been corrected.

      It needs further validation to show that Prrx1 marked cells are a good model for auricular chondrocyte-related studies.

      We show that Prrx1 marks auricle chondrocytes but few growth plate or articular chondrocytes in adult mice, suggestive its specificity. However, the use of Prrx1-CreERT line in auricle cartilage studies is complicated by the labelling of dermal cells in the external ear by Prrx1. We have discussed this point in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors address a fundamental unresolved question in cerebellar physiology: do synapses between granule cells (GCs) and Purkinje cells (PCs) made by the ascending part of the axon (AA) have different synaptic properties from those made by parallel fibers? This is an important question, as GCs integrate sensorimotor information from numerous brain areas with a precise and complex topography.

      Summary:

      The authors argue that CGs located close to PCs essentially contact PC dendrites via the ascending part of their axons. They demonstrate that joint high-frequency (100 Hz) stimulation of distant parallel fibers and local CGs potentiates AA-PC synapses, while parallel fiber-PC synapses are depressed. On the basis of paired-pulse ratio analysis, they concluded that evoked plasticity was postsynaptic. When individual pathways were stimulated alone, no LRP was observed. This associative plasticity appears to be sensitive to timing, as stimulation of parallel fibers first results in depression, while stimulation of the AA pathway has no effect. NMDA, mGluR1 and GABAA receptors are involved in this plasticity.

      Strengths:

      Overall, the associative modulation of synaptic transmission is convincing, and the experiments carried out support this conclusion. However, weaknesses limit the scope of the results.

      Weaknesses:

      One of the main weaknesses of this study is the suggestion that high-frequency parallel-fiber stimulation cannot induce long term potentiation unless combined with AA stimulation. Although we acknowledge that the stimulation and recording conditions were different from those of other studies, according to the literature (e.g. Bouvier et al 2016, Piochon et al 2016, Binda et al, 2016, Schonewille et al 2021 and others), high-frequency stimulation of parallel fibers leads to long-term postsynaptic potentiation under many different experimental conditions (blocked or unblocked inhibition, stimulation protocols, internal solution composition). Furthermore, in vivo experiments have confirmed that high-frequency parallel fibers are likely to induce long-term potentiation (Jorntell and Ekerot, 2002; Wang et al, 2009). This article provides further evidence that long-term plasticity (LTP and LTD) at this connection is a complex and subtle mechanism underpinned by many different transduction pathways. It would therefore have been interesting to test different protocols or conditions to explain the discrepancies observed in this dataset.

      Even though this is not the main result of this study, we acknowledge that the control experiments done on PF stimulation add a puzzling result to an already contradictory literature. High frequency parallel fibre stimulation (in isolation) has been shown to induce long term potentiation in vitro, but not always, and most importantly, this has been shown in vivo. This was in fact the reason for choosing that particular stimulation protocol. Examination of in vitro studies, however, show that the results are variable and even contradictory. Most were done in the presence of GABAA receptor antagonists, including the SK channel blocker Bicuculline, whereas in the study by Binda (2016), LTP was blocked by GABAA receptor inhibition. In some studies also, LTP was under the control of NMDAR activation only, whereas in Binda (2016), it was under the control of mGluR activation. Moreover, most experiments were done in mice, whereas our study was done in rats. Our results reveal intricate mechanisms working together to produce plasticity, which are highly sensitive to in vitro conditions. We designed our experiments to be close to physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to reproduce PF-LTP, but it was not the aim of this study to dissect the subtleties of the different experimental protocols and models. We will modify the Discussion to describe that point fully including differences in experimental conditions.

      Another important weakness is the lack of evidence that the AAs were stimulated. Indeed, without filling the PC with fluorescent dye or biocytin during the experiment, and without reconstructing the anatomical organization, it is difficult to assess whether the stimulating pipette is positioned in the GC cluster that is potentially in contact with the PC with the AAs. According to EM microscopy, AAs account for 3% of the total number of synapses in a PC, which could represent a significant number of synapses. Although the idea that AAs repeatedly contact the same Purkinje cell has been propagated, to the best of the review author's knowledge, no direct demonstration of this hypothesis has yet been published. In fact, what has been demonstrated (Walter et al 2009; Spaeth et al 2022) is that GCs have a higher probability of being connected to nearby PCs, but are not necessarily associated with AAs.

      We fully agree with the reviewer that we have not identified morphologically ascending axon synapses, and we stress this fact both in the first paragraph of the Results section, and again at the beginning of Discussion. Our point is mainly topographical, given the well documented geometrical organisation of the cerebellar cortex, and strictly speaking, inputs are local (including ascending axon) or distal (parallel fibre). Similarly, the studies by Isope and Barbour (2002) and Walter et al. (2009), just like Sims and Hartell (2005 and 2006), have coined the term ‘ascending axon’ when drawing conclusions about locally stimulated inputs. Moreover, our results do not rely on or assume multiple contacts, stronger connections, or higher probability of connections between ascending axons and Purkinje cells. Our results only demonstrate a different plasticity outcome for the two types of inputs. Therefore, our manuscript could be rephrased with the terms ‘local’ and ‘distal’ granule cell inputs, but this would have no more implication for the results or the computation performed in Purkinje cells. However, in our experience, this is more confusing to the reader, and as we already stress this point in the manuscript, we do not wish to make this modification. However we will modify the abstract of the manuscript to clarify that point.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a form of synaptic plasticity at synapses from granule cells onto Purkinje cells in the mouse cerebellum, which is specific to synapses proximal to the cell body but not to distal ones. This plasticity is induced by the paired or associative stimulation of the two types of synapses because it is not observed with stimulation of one type of synapse alone. In addition, this form of plasticity is dependent on the order in which the stimuli are presented, and is dependent on NMDA receptors, metabotropic glutamate receptors and to some degree on GABAA receptors. However, under all experimental conditions described, there is a progressive weakening or run-down of synaptic strength. Therefore, plasticity is not relative to a stable baseline, but relative to a process of continuous decline that occurs whether or not there is any plasticity-inducing stimulus.

      As highlighted by the reviewer, we observed a postsynaptic rundown of the EPSC amplitude for both input pathways. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation, and the progressive decrease of the EPSC amplitude during the course of an experiment leads to an underestimate of the absolute potentiation. We have taken the view to provide a strong set of control data rather than selecting experiments based on subjective criteria or applying a cosmetic compensation procedure. We have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown. Comparison shows a highly significant potentiation of the ascending axon EPSC. Depression of the parallel fibre EPSC, on the other hand, was not significantly different from rundown, and we have not spoken of parallel fibre long term depression. The data show thus very clearly that ascending axon and parallel fibre synapses behave differently following the costimulation protocol.

      Strengths:

      The focus of the authors on the properties of two different synapse-types on cerebellar Purkinje cells is interesting and relevant, given previous results that ascending and parallel fiber synapses might be functionally different and undergo different forms of plasticity. In addition, the interaction between these two synapse types during plasticity is important for understanding cerebellar function. The demonstration of timing and order-dependent potentiation of only one pathway, and not another, after associative stimulation of both pathways, changes our understanding of potential plasticity mechanisms. In addition, this observation opens up many new questions on underlying intracellular mechanisms as well as on its relevance for cerebellar learning and adaptation.

      Weaknesses and suggested improvements:

      A concern with this study is that all recordings demonstrate "rundown", a progressive decrease in the amplitude of the EPSC, starting during the baseline period and continuing after the plasticity-induction stimulus. In the absence of a stable baseline, it is hard to know what changes in strength actually occur at any set of synapses. Moreover, the issues that are causing rundown are not known and may or may not be related to the cellular processes involved in synaptic plasticity. This concern applies in particular to all the experiments where there is a decrease in synaptic strength.

      We have provided an answer to that point directly below the summary paragraph. Moreover, if the phenomenon causing rundown was involved in plasticity, it should affect plasticity of both inputs, which was not the case, clearly distinguishing the ascending axon and parallel fibre inputs.

      The authors should consider changes in the shape of the EPSC after plasticity induction, as in Fig 1 (orange trace) as this could change the interpretation.

      Figure 1 shows an average response composed of evoked excitatory and inhibitory synaptic currents. The third section of Supplementary material (supplementary figure 3) shows that this complex shape is given by an EPSC followed by a delayed disynaptic IPSC. We would like to point out that while separating EPSC from IPSC might appear difficult from average traces due to the averaged jitter in the onset of the synaptic currents, boundaries are much clearer when analysing individual traces. In the same section we discuss the results of experiments in which transient applications of SR 95531 before and after the induction protocol allowed us to measure the EPSC, while maintaining the experimental conditions during induction. Analysis of the kinetics of the EPSCs during gabazine application at the beginning and end of experiments, showed that there is no change in the time to peak of both AA and PF response. The decay time of AA and PF EPSC are slightly longer at the end of the experiment, even if the difference is not significant for AA inputs (we will add this analysis to the revised version of the paper). Our analysis, that uses as template the EPSCs kinetics measured at the beginning and at the end of the experiments, takes directly into account these changes. The results show clearly that the presence of disynaptic inhibition doesn’t significantly affect the measure of the peak EPSC after the induction protocol nor the estimate of plasticity.

      In addition, the inconsistency with previous results is surprising and is not explained; specifically, that no PF-LTP was induced by PF-alone repeated stimulation.

      In our experimental conditions, PF-LTP was not induced when stimulating PF only, the only condition that reproduces experiments in the literature. As discussed in our response to reviewer 1, a close look at the literature, however, reveals variabilities and contradictions behind seemingly similar results. They reveal intricate mechanisms working together to produce plasticity, which are sensitive to in vitro conditions. We designed our experiments to be close to physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to observe PF-LTP. We will modify the discussion section to discuss that point fully in the context of past results.

      The authors test the role of NMDARs, GABAARs and mGluRs in the phenotype they describe. The data suggest that the form of plasticity described here is dependent on any one of the three receptors. However, the location of these receptors varies between the Purkinje cells, granule cells and interneurons. The authors do not describe a convincing hypothetical model in which this dependence can be explained. They suggest that there is crosstalk between AA and PF synapses via endocannabinoids downstream of mGluR or NO downstream of NMDARs. However, it is not clear how this could lead to the long-term potentiation that they describe. Also, there is no long-lasting change in paired-pulse ratio, suggesting an absence of changes in presynaptic release.

      We suggest in the result section that the transient change in paired pulse ratio (PPR) is linked to a transient presynaptic effect only, which has been reported by others. This suggests that the long lasting changes observed are postsynaptic, like other reports with similar trains of stimulation, and we will modify the manuscript to state this clearly.

      Concerning the involvement of multiple molecular pathways, investigators often tested for the involvement of NMDAR or mGluRs in cerebellar plasticity, rarely both. Here we showed that both pathways are involved. The conjunctive requirement for NMDAR and mGluR activation can easily be explained based on the dependence of cerebellar LTP and LTD on the concentrations of both NO and postsynaptic calcium (Coesman et al., 2004; Safo and Regehr, 2005; Bouvier et al., 2016; Piochon et al., 2016). NO production has been linked to the activation of NMDARs in granule cell axons (Casado et al., 2002; Bidoret et al., 2009; Bouvier et al., 2016), occasionally in molecular layer interneurones (Kono et al., 2019). NO diffuses to activate Guanylate Cyclase in the Purkinje cell. Based on the literature also, different mechanisms can feed a calcium increase, including mGluRs activation. Therefore NMDARs and mGluRs can reasonably cooperate to control postsynaptic plasticity. The associative nature of AA-LTP is more complex to explain, i.e. the requirement for co-activation of AA and PF inputs, and indicates a necessary cross talk between synaptic sites. We propose that either one of the receptors is absent from AA synapses, and a signal needs to propagate from PF to AA synapses, or that both receptors are present but a signal is required to activate one of the receptors at AA synapses.

      We also observed an effect of GABAergic inhibition. GABAergic inhibition was elegantly shown by Binda (2016) to regulate calcium entry together with mGluRs, and control plasticity induction. A similar mechanism could contribute to our results, although inhibition might have additional effects. We will modify the discussion of the manuscript and add a diagram to highlight the links between the different molecular pathways and potential cross talk mechanisms, and the location of receptors.

      Is the synapse that undergoes plasticity correctly identified? In this study, since GABAergic inhibition is not blocked for most experiments, PF stimulation can result in both a direct EPSC onto the Purkinje cell and a disynaptic feedforward IPSC. The authors do address this issue with Supplementary Fig 3, where the impact of the IPSC on the EPSC within the EPSC/IPSC sequence is calculated. However, a change in waveform would complicate this analysis. An experiment with pharmacological blockade will make the interpretation more robust. The observed dependence of the plasticity on GABAA receptors is an added point in favor of the suggested additional experiments.

      We did consider that due to long recording times there might be kinetic changes, and that’s the reason why the experiments of Supplementary figure 3 were done with pharmacological blockade of GABAAR with gabazine, both before and again after LTP induction. The estimate of the amplitude of the EPSC is based on the actual kinetics of the response at both times.

      A primary hypothesis of this study is that proximal, or AA, and distal, or PF, synapses are different and that their association is specifically what drives plasticity. The alternative hypothesis is that the two synapse-types are the same. Therefore, a good control for pairing AA with PF would be to pair AA with AA and PF with PF, thereby demonstrating that pairing with each other is different from pairing with self.

      Pairing AA with AA would be difficult because stimulation of AA can only be made from a narrow band below the PC and we would likely end up stimulating overlapping sets of synapses.. However, Figure 5 shows the effect of stimulating PF and PF, while also mimicking the sparse and dense configuration of the usual experiment. It shows that sparse PF do not behave like AA. Sims and Hartell (2006) also made an experiment with sparse PF inputs and observed clear differences between sparse local (AA) and sparse distal (PF) synapses.

      It is hypothesized that the association of a PF input with an AA input is similar to the association of a PF input with a CF input. However, the two are very different in terms of cellular location, with the CF input being in a position to directly interact with PF-driven inputs. Therefore, there are two major issues with this hypothesis: 1) how can sub-threshold activity at one set of synapses affect another located hundreds of micrometers away on the same dendritic tree? 2) There is evidence that the CF encodes teaching/error or reward information, which is functionally meaningful as a driver of plasticity at PF synapses. The AA synapse on one set of Purkinje cells is carrying exactly the same information as the PF synapses on another set of Purkinje cells further up and down the parallel fiber beam. It is suggested that the two inputs carry sensory vs. motor information, which is why this form of plasticity was tested. However, the granule cells that lead to both the AA and PF synapses are receiving the same modalities of mossy fiber information. Therefore, one needs to presuppose different populations of granule cells for sensory and motor inputs or receptive field and contextual information. As a consequence, which granule cells lead to AA synapses and which to PF synapses will change depending on which Purkinje cell you're recording from. And that's inconsistent with there being a timing dependence of AA-PF pairing in only one direction. Overall, it would be helpful to discuss the functional implications of this form of plasticity.

      We do not hypothesise that association of the AA and PF inputs is similar to the association of PF and climbing fibre inputs. We compare them because it is the only other known configuration triggering associative plasticity in Purkinje cells. We conclude that ‘The climbing fibre is not the only key to associative plasticity’, and it is indeed interesting to observe that even if the inputs are very small compared to the powerful climbing fibre input, they can be effective at inducing plasticity. Physiologically, the climbing fibre signal has been clearly linked to error and reward signals, but reward signals are also encoded by granule cell inputs (Wagner et al., 2017). We will modify the discussion to make sure that we do not suggest equivalence with CF induced LTD.

      Moreover, we fully agree that AA and PF synapses made up by a given granule cell carry the same information, and cannot encode sensory and motor information at the same time. Yet, these synapses carry different information. AA synapses from a local granule cell deliver information about the local receptive field, but PF synapses from the same granule cell will deliver contextual information about that receptive field to distant Purkinje cells. In the context of sensorimotor learning, movement is learnt with respect to a global context, not in isolation, therefore learning a particular association must be relevant. The associative plasticity we describe here could help explain this functional association. Difference in timing of the inputs therefore should represent difference in the timing of activation of different granule cells which receive either local information or information from different receptive fields. We will modify the discussion to make sure we do not suggest association between sensory and motor inputs, and clarify our view of local receptive field and context about ongoing activity.

      Reviewer #3 (Public Review):

      Granule cells' axons bifurcate to form parallel fibers (PFs) and ascending axons (AAs). While the significance of PFs on cerebellar plasticity is widely acknowledged, the importance of AAs remains unclear. In the current paper, Conti and Auger conducted electrophysiological experiments in rat cerebellar slices and identified a new form of synaptic plasticity in the AA-Purkinje cell (PC) synapses. Upon simultaneous stimulation of AAs and PFs, AA-PC EPSCs increased, while PFs-EPSCs decreased. This suggests that synaptic responses to AAs and PFs in PCs are jointly regulated, working as an additional mechanism to integrate motor/sensory input. This finding may offer new perspectives in studying and modeling cerebellum-dependent behavior. Overall, the experiments are performed well. However, there are two weaknesses. First, the baseline of electrophysiological recordings is influenced significantly by run-down, making it difficult to interpret the data quantitatively. The amplitude of AA-EPSCs is relatively small and the run-down may mask the change. The authors should carefully reexamine the data with appropriate controls and statistics. Second, while the authors show AA-LTP depends on mGluR, NMDA receptors, and GABA-A receptors, which cell types express these receptors and how they contribute to plasticity is not clarified. The recommended experiments may help to improve the quality of the manuscript.

      As highlighted by the reviewer and developed above in response to reviewer 2, we observed a postsynaptic rundown of the EPSC amplitude. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation. Moreover, we have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown, and provide a baseline. Comparison shows a highly significant potentiation of the ascending axon EPSC, relative to baseline and relative to these control experiments. Depression of the parallel fibre EPSC on the other hand was not significantly different from rundown. For that reason we have not spoken of parallel fibre long term depression. The data, however, show that ascending axon and parallel fibre synapses behave very differently following the costimulation protocol.

      We have discussed above in our response to reviewer 2 the potential involvement of mGluRs, NMDARs and GABAARs. We will modify the discussion of the manuscript and add a diagram to highlight the links between the different molecular pathways and potential cross talk mechanisms, and the location of receptors.

    1. Author Response:

      We greatly appreciate the insightful feedback provided by the reviewers and the editor on our manuscript titled "Automated workflow for the cell cycle analysis of non-adherent and adherent cells using a machine learning approach".  We will provide a revised version of the manuscript aiming to address the comments and recommendations provided by the reviewers to enhance the quality and clarity of our work. In detail:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript proposes a series of steps using the FIJI environment, the authors have created a plugin for the initial steps of the process, merging images into an RGB stack, conversion to HSV, and then using brightness for reference and hue to distinguish the phases of the cycle. Then, the well-known Trackmate plugin was used to identify single cells and extract intensities. The data was further post-processed in R, where a series of steps, smoothing, scaling, and addressing missing frames were used to train a random forest. Hard-coded values of hue were used to distinguish G1, S, and G2/M. The process was validated with a score comparing the quality of the tracks and the authors reported the successful measure of the cell cycles.

      Strengths:

      The implementation of the pipeline seems easy, although it requires two separate platforms: Fiji and R. A similar approach could be implemented in a single programming environment like Python or Matlab and there would not be any need to export from one to the other. However, many labs have similar setups and that is not necessarily a problem.

      Weaknesses:

      I found two important weaknesses in the proposal:

      (1) The pipeline relies on a large number of hard-coded conditions: size of Gaussian blur (Gaussian should be written in uppercase), values of contrast, size of filters, levels of intensity, etc. Presumably, the authors followed a heuristic approach and tried values of these and concluded that the ones proposed were optimal. A proper sensitivity analysis should be performed. That is, select a range of values of the variables and measure the effect on the output.

      (2) Linked to the previous comments. Other researchers that want to follow the pipeline would have either to have exactly the same acquisition conditions as the manuscript or start playing with values and try to compensate for any difference in their data (cell diameter, fluorescent intensity, etc.) to see if they can match the results of the manuscript.

      We thank Reviewer #1 for the insightful comments. We acknowledge the importance of ensuring the reproducibility and robustness of our pipeline among different sample types, acquisition conditions and, consequently, image S/N ratio and resolution. To address the concerns regarding the reliance on hard-coded conditions and the impact of varying parameter values on the output, we will complete the Methods section of the manuscript and the “Usage” section of the README file in the Github repository (https://github.com/ieoresearch/cellcycle-image-analysis)  providing a summary of best practices that should be applied in the pre-processing part of the analysis. As an example, the usable image filters types and their settings related to cells with different size, fluorescence intensities and acquisition conditions will be analysed in detail and general guidelines will be provided.

      Moreover, we will provide detailed documentation on the acquisition conditions required for reproducibility in the README file and Methods section.

      For the Tracking Analysis part, we will refer to the well documented TrackMate tutorial to adapt the tracking analysis to different cell types, image resolution and intensities.

      Reviewer #2 (Public Review):

      Summary:

      This paper presents an automated method to track individual mammalian cells as they progress through the cell cycle using the FUCCI system and applies the method to look at different tumor cell lines that grow in suspension and determine their cell cycle profile and the effect of drugs that directly affect the cell cycles, on progression through the cell cycle for a 72 hour period.

      Strengths:

      This is a METHODS paper. The one potentially novel finding is that they can identify cells that are at the G1-S transition by the change in color as one protein starts to go up and the other one goes down, similar to the change seen as cells enter G2/M.

      Weaknesses:

      They did not clearly indicate whether the G1/S cells are identified automatically or need to be identified by the person reviewing the data. In Figures 1 and S1, the movie shows cells with no color at a time corresponding to what is about the G1/S transition. Their assigned cell cycle phase is shown in Figure 1 but not in Figure S1. None of these pictures show the G1/S cells that they talk about being able to detect with a different color.

      Thank you for your valuable feedback regarding the identification of G1/S cells in our pipeline. To clarify, the G1/S phase identification process is entirely automated within our pipeline. We apologize for any confusion caused by the lack of explicit indication in our manuscript. We will ensure to update the manuscript to clearly state that the identification of G1/S cells is performed automatically by our algorithm, eliminating the need for manual intervention.

      Regarding the visualization of G1/S cells in Figures 1 and S1, we will revise the figures to include all the available frames referred to the G1/S transition. It's important to note that during this transition, fluorescence intensities for both the green and the red channels, are dimmer in comparison with their intensity levels during the G2/M transitions. This can result in frames that may seem visually darker, despite both colors coexisting at the same time point. In our revised figures, we will ensure to include all available frames relevant to the G1/S transition and provide a clearer representation of this phenomenon.

      In response to Reviewer #2's recommendation, we plan to conduct additional experiments to further validate our observations. We will utilize the EdU technology to highlight the S-phase in FUCCI cells, allowing for better discrimination between the red and green fluorescence of the FUCCI reporter during the initial S-phase.

      Additionally, we acknowledge that the link to the Docker container (https://hub.docker.com/r/emanuelsoda/rf_semi_sup)  was not included in the manuscript. We apologize for this oversight, and it will be included in the revised version of the paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      An account of the major strengths and weaknesses of the methods and results.

      Strengths:

      Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses were properly addressed in the revised manuscript, and I do not have any additional concerns.

      We appreciate the reviewer for the constructive comments and recommendations, which were a great help for us to improve our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals that receives the SARS-CoV2 mRNA vaccines and collect sera and PBMCs samples on different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by Sprotein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these result, the paper reports two major findings&claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset. B). S-reactive T cells do exist before the vaccination, but they seems to be unable to response to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh clonotypes/sustained antibody and about the S-reactive clones that exist before the vaccination. The conclusion is solid in general but some claims are overstated. My suggestion is the authors should further limit their claims in abstract, for example,

      ”Even before vaccination, S-reactive CD4+ T cell clonotypes did exist, most of which (MAY) cross-reacted with environmental or symbiotic bacteria" -- The paper don't have experimental evidence to show these TCR clones respond to these epitopes.

      We thank the reviewer for pointing out the insufficient demonstration of experimental evidence. We have added the relevant data to Fig. S5 in the newly revised manuscript.

      "These results suggest that de novo acquisition of memory Tfh-like cells upon vaccination (LIKELY) contributes to the longevity of anti-S antibody titers." --Given the small sample size and the statistical analysis was not significant, this claim was overstated.

      "S-reactive T cell clonotypes detected immediately after 2nd vaccination polarized to follicular helper T (Tfh)-like cells (UNDER IN VITRO CULTURE)". -- the conclusion was based on vitro cultured cells, which had limitation.

      We thank the reviewer for the helpful suggestion. We have corrected some sentences in line with these suggestions in the newly revised manuscript.

      Recommendations for the authors:

      Please note: Though most of the overstatement was removed from the original manuscript, authors still need to modify some of the statements in "Abstract".

      We thank the reviewer for carefully reading our manuscript and giving us detailed suggestions. We have modified these statements in “Abstract” accordingly in the newly revised manuscript.

    1. Author Response

      The following is the authors’ response to the current reviews.

      At this stage the referees had only minor comments. Referee #1 asked whether archerfish indeed generalize in egocentric rather than allocentric coordinates. It might be that the current results do not rule out the idea that archerfish are unaware of changes in body position, they continue with previously successful actions, that seems as egocentric generalization. We agree with referee #1 and updated lines 255-260 in the results and added lines 329-336 in the discussion text that mentions this possibility. Referee #2 mentioned that a portion of fish did not make it to the final test which raises the question whether all individuals are able to solve the task. We agree with referee #2 and added paragraph at the discussion section to mention this point (lines 384-388). We also added the salinity of the water in the water tanks (line 98) as per suggestion of the Referee #2. Referee #2 suggested using a different term than “washout” in the behavioral experiments. Since the term “washout” is standard in the field, we keep the term in the text.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful study explores how archerfish adapt their shooting behavior to environmental changes, particularly airflow perturbations. It will be of interest to experts interested in mechanisms for motor learning. While the evidence for an internal model for adaptation is solid, evidence for adaptation to light refraction, as initially hypothesized, is inconclusive. As such, the evidence supporting an egocentric representation might be caused by alternative mechanisms to airflow perturbations.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors examined whether archerfish have the capacity for motor adaptation in response to airflow perturbations. Through two experiments, they demonstrated that archerfish could adapt. Moreover, when the fish flipped its body position with the perturbation remaining constant, it did not instantaneously counteract the error. Instead, the archerfish initially persisted in correcting for the original perturbation before eventually adapting, consistent with the notion that the archerfish's internal model has been adapted in egocentric coordinates.

      Evaluation:

      The results of both experiments were convincing, given the observable learning curve and the clear aftereffect. The ability of these fish to correct their errors is also remarkable. Nonetheless, certain aspects of the experiment's motivation and conclusions temper my enthusiasm.

      (1) The authors motivated their experiments with two hypotheses, asking whether archerfish can adapt to light refractions using an innate look-up table as opposed to possessing a capacity to adapt. However, the present experiments are not designed to arbitrate between these ideas. That is, the current experiments do not rule out the look-up table hypothesis, which predicts, for example, that motor adaptation may not generalize to de novo situations with arbitrary actionoutcome associations. Such look-up table operations may also show set-size effects, whereas other mechanisms might not. Whether their capacity to adapt is innate or learned was also not directly tested, as noted by the authors in the discussion. Could the authors clarify how they see their results positioned in light of the two hypotheses noted in the Introduction?

      We agree with the referee that look up tables only confuse the issue. The question we tested is whether or not the fish uses adaptation mechanisms to correct its shooting. We have now changed the introduction both to eliminate the entire question of look up tables and also to clarify that both innate mechanisms and learning mechanisms can contribute to fish shooting, and that our research focuses on the question of whether the fish can adapt to a perturbation in its shooting caused by a change in its physical environment.

      (2) The authors claim that archerfish use egocentric coordinates rather than allocentric coordinates. However, the current experiments do not make clear whether the archerfish are "aware" that their position was flipped (as the authors noted, no visual cues were provided). As such, for example, if the fish were "unaware" of the switch, can the authors still assert that generalization occurs in egocentric coordinates? Or simply that, when archerfish are ostensibly unaware of changes in body position, they continue with previously successful actions.

      The fish has access to the body position switch: there are clues in a water tank that can help the fish orient inside the water tank. Additionally, there are no clues to the presence or direction of the air flow above the water tank. Moreover, previous experience has shown that the fish is sensitive to the visual cues and uses them to achieve consistent orientation within the tank when possible. These points have been added to the main text [lines 143-144, 254-257]

      (3) The experiments offer an opportunity to examine whether archerfish demonstrate any savings from one session to another. Savings are often attributed to a faster look-up table operation. As such, if archerfish do not exhibit savings, it might indicate a scenario where they do not possess a refined look-up table and must rely on implicit mechanisms to relearn each time.

      This is an important question. Indeed, we looked for the ‘saving’ effect in the data, but its noisy nature prevented us from drawing a concrete conclusion. We now mention this in lines 247-249.

      We have also eliminated the discussion of look up tables from the article.

      (4) The authors suggest that motor adaptation in response to wind may hint at mechanisms used to adapt to light refraction. However, how strong of a parallel can one draw between adapting to wind versus adapting to light refraction? This seems important given the claims in this paper regarding shared mechanisms between these processes. As a thought experiment, what would the authors predict if they provided a perturbation more akin to light refraction (e.g., a film that distorts light in a new direction, rather than airflow)?

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      (5) The number of fish excluded was greater than those included. This raises the question as to whether these fish are merely elite specimens or representative of the species in general.

      The filtering of the fish was in the training stage. The requirements were quite strict: the fish had to produce enough shots each day in the experimental setup. Very few fish succeeded. But all fish that got to the stage of perturbation exhibited the adaptation effect. We do not see a reason to think that the motivation to shoot will have a strong interaction with the shooting adaptation mechanisms.

      Reviewer #2 (Public Review):

      Summary:

      The work of Volotsky et al presented here shows that adult archerfish are able to adjust their shooting in response to their own visual feedback, taking consistent alterations of their shot, here by an air flow, into account. The evidence provided points to an internal mechanism of shooting adaptation that is independent of external cues, such as wind. The authors provide evidence for this by forcing the fish to shoot from 2 different orientations to the external alteration of their shots (the airflow). This paper thus provides behavioral evidence of an internal correction mechanism, that underlies adaptive motor control of this behavior. It does not provide direct evidence of refractory index-associated shoot adjustance.

      Strengths:

      The authors have used a high number of trials and strong statistical analysis to analyze their behavioral data.

      Weaknesses:

      While the introduction, the title, and the discussion are associated with the refraction index, the latter was not altered, and neither was the position of the target. The "shot" was altered, this is a simple motor adaptation task and not a question related to the refractory index. The title, abstract, and the introduction are thus misleading. The authors appear to deduce from their data that the wind is not taken into account and thus conclude that the fish perceive a different refractory index. This might be based on the assumption that fish always hit their target, which is not the case. The airflow does not alter the position of the target, thus the airflow does not alter the refractive index. The fish likely does not perceive the airflow, thus alteration of its shooting abilities is likely assumed to be an "internal problem" of shooting. I am sorry but I am not able to understand the conclusion they draw from their data.

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      Reviewer #2 (Recommendations For The Authors):

      I have had a hard time trying to understand how the authors concluded that the RI is important here as it is not altered. Thus I did not understand the conclusions drawn from this paper. The experiments are well described, but the conclusions are not to me. Maybe schematics would help to clarify. I am from outside the field and represent a naïve reader with an average intellect. The authors need to do a better job of explaining their results if they want others to understand their conclusions.

      See response to the public comments.

      Minor comments:

      Line 9: omit the "an".

      Done.

      Line 11: this sentence would fit way better if it followed the next one.<br /> Done.

      Line 15: and all the rest of the paper: washout is a strange term and for me associated with pharmacological manipulations - might only be me. I suggest using recovery instead throughout the manuscript.

      The term ‘washout’ is often used in the field of motor adaptation to describe the return to original condition. For example:

      Kluzik J, Diedrichsen J, Shadmehr R, Bastian AJ (2008) Reach adaptation: what determines whether we learn an internal model of the tool or adapt the model of our arm? J Neurophysiol 100:1455-64. doi: 10.1152/jn.90334.2008

      Donchin O, Rabe K, Diedrichsen J, Lally N, Schoch B, Gizewski ER, Timmann D (2012) Cerebellar regions involved in adaptation to force field and visuomotor perturbation. J Neurophysiol 107:134-47

      Line 19: the fish does not expect the flow, it expects that it shoots too short- no?

      Done.

      Line 35: fix the citation - in your reference manager.

      Done.

      Line 52: provide some examples of the mechanisms you think of or papers of it for naive readers. Otherwise, this sentence is not helpful for the reader.

      Done.

      Line 183: it's unclear which parameter you mean. Rephrase.

      Done.

      Line 197: should read to test "the" - same sentence: you repeat yourself- rephrase the sentence.

      Done.

      Figure 4: it was unclear to me why the figure was differentiating between fishes until I read the legend. Why not include direct information in the figure? A schematic maybe? Legend: you have a double "that" in C.

      We added the title for each column with the information about the direction of air.

      Figures: in all figures, perturbation is wrongly spelled! Change the term washout to recovery.

      Done. We kept the term ‘washout’

    1. Author response:

      We are grateful to reviewer #1 for positive evaluation of our work and for providing valuable comments that will significantly enhance the presentation of our results. We understand reviewer #2's negative assessment because we did not discuss an alternative model of dosage compensation in Drosophila. We will address this omission in the Introduction section of the revised manuscript and remove any controversial statements from other parts of the text. However, it is important to clarify that our study does not focus on the mechanisms of dosage compensation. The main goal of the manuscript was to investigate the assembly of the MSL complex and its specific binding to the Drosophila X chromosome. We utilized male survival data to demonstrate the efficacy of MSL complex binding to the X chromosome, a relationship that has been supported by numerous independent studies. We understand that Reviewer #2 agrees that disruption of the MSL complex binding results in male lethality. As far as we understand, Reviewer #2 suggests that the MSL complex does not activate transcription of X chromosome genes, but instead facilitate the recruitment of MOF protein and potentially other general transcription factors to the X chromosome. This could explain the decrease in autosomal gene expression due to a reduction in activating factors like MOF at autosomal promoters. In the upcoming revision, we aim to strike a balance between the two models that elucidate dosage compensation in Drosophila. We appreciate your feedback and look forward to enhancing the clarity and coherence of our manuscript based on your insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      We thank the reviewer for the positive assessment of the experimental work done.

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila.

      We completely agree with this reviewer's claim. In the Introduction section we’ll attempt to make clear that there are two models for the functional role of specific recruitment of the MSL complex to the X chromosome in males.

      Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors use sex-specific lethality as a measure of disruption of dosage compensation, but other modulations of gene expression are the likely cause of these viability effects.

      Sun et al (2013, PNAS 110: E808-817) showed that recruitment of the MSL complex-specific subunit MSL2 or the MOF protein to the UAS promoter resulted in recruitment of the entire MSL complex in males but not transcriptional activation. This important result argues that the MSL complex does not activate transcription. However, it must be taken into account that the GAL4 DNA binding region used to recruit the chimeric MSL2 protein to the UAS promoter was directly fused to the MSL2 RING domain, which is critical for interaction of MSL2 with MSL1 and its ubiquitination activity (this activity could potentially be involved in transcription activation). It also remains poorly understood what happens to the MSL complex after recruitment to the promoters or HAS on the X chromosome. Subcomplex MSL1/MSL3/MOF can acetylate TF and H4K16 during RNA polymerase II elongation, resulting in increasing of transcription. The separate role of MSL2 and MSL1 in the activation of transcription of gene promoters is also shown. Sun et al. showed that in females, recruitment of MOF to the UAS promoter leads to a strong increase in transcription, which is associated with the inclusion of MOF in the non-specific lethal (NSL) complex, which is bound to promoters and is required for strong transcription activation. In males, MOF is preferentially recruited to the UAS promoter in the full MSL complex or perhaps in the MSL1/MSL3/MOF subcomplex, which stimulates transcription during RNA polymerase II elongation much less strongly than NSL complex. The same result was obtained in the Prestel et al. 2010 (Mol Cell 38:815-26). In this study the GAL4 binding sites were inserted upstream of the lacZ and mini-white genes. Activation of transcription after recruitment of GAL4-MOF to the GAL4 sites was studied in males and females. As in Sun et al. 2013, strong activation of the reporter was observed in females. A weak transcriptional activation of the reporter gene in males was shown, and the MOF protein was detected not only on the promoter, but also in the coding and 3’ regions of the reporter.

      We do not understand how the paper by Aleman et al (Cell Reports 35: 109236, 2021) is consistent with the hypothesis that the MSL complex is not involved in the transcriptional activation of X chromosomal genes. The main conclusions of this paper: 1) Inactivation of Mtor leads to selective activation of the male X chromosome. 2) Mtor-driven attenuation of male X occurs in broad domains linked by the MSL complex. 3) Mtor genetically interacts with MSL components and reduces male mortality; 4) Mtor restrains dose-compensated expression at the level of nascent transcription. Thus, the paper shows that the MSL complex has an activator activity that is partially inhibited by Mtor. Accordingly, inactivation of Mtor only partially restored the survival of males in which dosage compensation was not completely inactivated.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550).

      We agree that an alternative model of the dosage compensation mechanism is reasonable. We can assume that both mechanisms can function jointly provide effective dosage compensation in Drosophila males. At the suggestion of the reviewer to reconsider the entire context of the article, we will make many small changes throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Overall, I found the text well written and the figures logically organized (especially Figure 5, which had the potential to confuse). The authors especially excelled in bringing together the decades of literature in the Discussion.

      I offer several suggestions to improve the readability:

      Consider presenting the coiled-coil domain homology in Figure 1A as a contrast for the N-terminal region, which the authors claim is poorly conserved.

      We’ll add the coiled-coil domain homology in Figure 1A in new version of MS.

      It is difficult to visualize the red MSL2 in Figure 2; the green and red panels should be presented separately in the main text, as they are in the Supplemental Figure 2.

      We’ll prepare Figure 2 with separate green and red panels.

      The ChIP-seq experiments for MSL proteins are well presented, but in my opinion, add little to the overall conclusions:

      Figure 6 mostly recapitulates what has already been published and utilized by several groups, most recently the authors themselves (Tikhonova 2019): that MSL expressed in females targets the X/HAS, similar to in males. While these are nice supporting data for the female transgenic system, I do not believe this figure should be prominently featured as if this is a novelty of the current study.

      We fully agree with the reviewer's comment about the limitation of scientific novelty in Figure 6. It has an auxiliary meaning. Therefore, we decided to transfer this figure to Supplementary material.

      The ChIP experiments in Figure 7 agree with the conclusions in Figures 2 and 3 (polytene chromosome immunostaining) when it comes to X/autosome localization. I believe it would help with the flow of the paper if these experiments were combined or at least placed closer together in the narrative, rather than falling at the end.

      We’ll move Figure 7 closer to polytene chromosome immunostaining. We agree with reviewer that this placement of the figure will make it easier to perceive the meaning of the article as a whole.

      I find Figure 8 difficult to understand, especially since the "clusters" are not annotated in the figure, but are described in the text. I struggled to follow the authors' conclusions based on these data. The authors could clarify the figure with annotations, although to be honest I do not currently see the value of this analysis/figure.

      In the new version of the article, we will try to make this figure more understandable: we will add explanations to the figure and a legend to it, and we will also try to place emphasis more clearly in the text of the article.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I have only a few comments that I think will improve the manuscript and help readers better appreciate the context of the reported results.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible.

      One paradox, that the authors point out, is that the drastic effects of TALK-1 L114P on plasma membrane potential do not result in a complete loss of insulin secretion. One important consideration is the role of intracellular stores in insulin secretion at physiological levels of hyperglycemia. This needs to be discussed more thoroughly, especially in the light of recent papers like Postic et al 2023 AJP and others. The authors do show an upregulation of IP3-induced Ca release. It is not clear whether they think this is a direct or indirect effect on the ER. Is there more IP3? More IP3R? Are the stores more full?

      The reviewer brings up an important point. Although we see a significant reduction in glucose-stimulated depolarization in most islets from TALK-1 L114P mice, some glucosestimulated calcium influx is still present (especially from female islets); this suggests that a subset of islet β-cells are still capable of depolarization. Because our original membrane potential recordings were done in whole islets without identification of the cell type being recorded, we have now repeated these electrical recordings in confirmed β-cells (see Supplemental figure 6). The new data shows that 33% of TALK-1 L114P β-cells show action potential firing in 11 mM glucose, which would be predicted to stimulate insulin secretion from a third of all TALK-1 L114P β-cells; this could be responsible for the remaining glucosestimulated insulin secretion observed from TALK-1 L114P islets. However, ER calcium store release could also allow for some of the calcium response in the TALK-1 L114P islets. We have now detailed this in the discussion; this now details the Postic et. al. study showing that glucose-stimulated beta-cell calcium increases involve ER calcium release as it occurs in the presence of voltage-dependent calcium channel inhibition. Future studies can assess this using SERCA inhibitors and determining if glucose-stimulated calcium influx in TALK-1 L114P islets is lost. We also find that muscarinic stimulated calcium influx from ER stores is greater in TALK-1 L114P mice. We currently do not have data to support the mechanism for this enhancement of muscarinic-induced islet calcium responses from islets expressing TALK1 L114P. Our hypothesis is that greater TALK-1 current on the ER membrane is enhancing ER calcium release in response to IP3R activation. There is an equivalent IP3R expression in control and TALK-1 L114P islets based on transcriptome analysis, which is now included in the manuscript. However, whether there is greater IP3 production, greater ER calcium storage, and/or greater ER calcium release requires further analysis. Because this finding was not directly related to the metabolic characterization of this TALK-1 L114P MODY mutation, we are planning to examine the ER functions of TALK-1L114P thoroughly in a future manuscript.

      The authors point to the possible roles of TALK-1 in alpha and delta cells. A limitation of the global knock-in approach is that the cell type specificity of the effects can't easily be determined. This should be more explicitly described as a limitation.

      We thank the reviewer for this suggestion and have added this to the discussion. This is now included in a paragraph at the end of the discussion detailing the limitations of this manuscript.

      The official gene name for TALK-1 is KCNK16. This reviewer wonders whether it wouldn't be better for this official name to be used throughout, instead of switching back and forth. The official name is used for Abcc8 for example.

      We thank the reviewer for this suggestion and have revised the manuscript to include Kcnk16 L114P. The instances of TALK-1 L114P that remain in the manuscript are in cases where the text specifically discusses TALK-1 channel function.

      There are several typos and mistakes in editing. For example, on page 5 it looks like "PMID:11263999" has not been inserted. I suggest an additional careful proofreading.

      We have revised this reference, thoroughly proofread the revised manuscript, and corrected typos.

      The difference in lethality between the strains is fascinating. Might be good to mention other examples of ion channel genes where strain alters the severe phenotypes? Additional speculation on the mechanism could be warranted. It also offers the opportunity to search for genetic modifiers. This could be discussed.

      We thank the reviewer for this suggestion and have added details on mutations where strain alters lethality.

      The sex differences are interesting. Of course, estrogen plays a role as mentioned at the bottom of page 16, but there have been more involved analyses of islet sex differences, including a recent paper from the Rideout group. Is there a sex difference in the islet expression of KCNK16 mRNA or protein, in mice or humans?

      We thank the reviewer for the important comments on the TALK-1 L114P sex differences. We have revised the manuscript to include greater discussion about female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by the reviewer (PMID: 36690328). Because these sex differences in islet function were examined in mice, we looked at KCNK16 expression in mouse beta-cells. While there is a trend for greater KCNK16 expression in sorted male beta-cells (average RPKM 6296.25 +/-953.84) compared to sorted female beta-cells (5148.25 +/- 1013.22). Similarly, there was a trend toward greater KCNK16 expression in male HFD treated mouse beta-cells (average RPKM 8020.75 +/- 1944.41) compared to female HFD treated mouse beta-cells (average RPKM 7551 +/- 2952.70). We have now added this to the text.

      Page 15-16 "Indeed, it has been well established that insulin signaling is required for neonatal survival; for example, a similar neonatal lethality phenotype was observed in mice without insulin receptors (Insr-/-) where death results from hyperglycemia and diabetic ketoacidosis by P3 (40)." Formally, the authors are not examining insulin signaling. A better comparison is that of the Ins1/Ins2 double knockout model of complete hypoinsulinemia.

      We thank the reviewer for suggesting this as the appropriate comparison model and have now revised the manuscript to detail the 48-hour average life expectancy of Ins1/Ins2 double knockout mice (PMID: 9144203).

      There are probably too many abbreviations in the paper, making it harder to read by nonspecialists. I recommend writing out GOF, GSIS, WT, K2P, etc.

      We thank the reviewer for this suggestion and have revised the manuscript to reduce the use of most abbreviations.

      Reviewer #2:

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened the manuscript and are summarized below.

      (1) The authors perform an RNA-sequencing showing that the cAMP amplifying pathway is upregulated. Is this also true in humans with this mutation? Other follow-up comments and questions from this observation:

      a) Will this mean that the treatment with incretins will improve glucose-stimulated insulin secretion and Ca2+ signalling and lower blood glucose? The authors should at least present data on glucose-stimulated insulin secretion and/or Ca2+ signalling in the presence of a compound increasing intracellular cAMP.

      b) Will an OGTT give different results than the IPGTT performed due to the fact that the cAMP pathway is upregulated?

      c) Is the increased glucagon area and glucagon secretion a compensatory mechanism that increases cAMP? What happens if glucagon receptors are blocked?

      We thank the reviewer for the suggestions. Although cAMP pathways were upregulated in the TALK-1 L114P islets, the changes in expression were only modest as examined by qRTPCR. Thus, we are not sure if this plays a role in secretion. For humans with this mutation, there have been such a small number of patients and no islets isolated from these patients. Therefore, we are unaware if the cAMP amplifying pathway is upregulated in humans with the MODY associated TALK-1 L114P mutation. We have performed the suggested experiment assessing calcium from TALK-1 L114P islets in response to liraglutide (see Supplemental figure 10); there was no liraglutide response in TALK-1 L114P islets. We have also performed the OGTT experiments as suggested and these have now been added to the manuscript (see Supplemental figure 3). We do not believe that the increased glucagon is a compensatory response, because: 1. TALK-1 deficient islets have less glucagon secretion due to reduced SST secretion (see PMID: 29402588); 2. There is no change in insulin secretion at 7mM glucose, however, glucagon secretion is significantly elevated from islets isolated from TALK-1 L114P mice; 3. TALK-1 is highly expressed in delta-cells, and in these cells TALK-1 L114P would be predicted to cause significant hyperpolarization and significant reductions in calcium entry as well as SST secretion. Thus, reduced SST secretion may be responsible for the elevation of glucagon secretion. We plan to investigate delta-cells within islets from TALK-1 L114P mice in future studies to determine if changes in SST secretion are responsible for the elevated glucagon secretion from TALK-1 L114P islets.

      (2) The performance of measurements in both male and female mice is praiseworthy. However, despite differences in the response, the authors do not investigate the potential reason for this. Are hormonal differences of importance?

      We thank the reviewer for this important point. It is indeed becoming clear that there are many differences between male and female islet function and responses to stress. Thus, we have revised the manuscript to include greater discussion about these differences such as female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by reviewer 1 (PMID: 36690328). While the differences in islet function and GTT between male and female L114P mice are clear, they both show diminished islet calcium handling, defective hormone secretion, and development of glucose intolerance. This manuscript was intended to demonstrate how the MODY TALK-1 L114P causing mutation caused glucose dyshomeostasis, which we have determined in both male and female mice. The mechanistic determination for the differences between male and female mice and islets with TALK-1 L114P could be due to multiple potential causes (as detailed in PMID: 36690328), thus, we believe that comprehensive studies are required to thoroughly determine how the TALK-1 L114P mutation differently impacts male and female mice and islets, which we plan to complete in a future manuscript.

      (3) MINOR: Page 5 .." channels would be active at resting Vm PMID:11263999.." The actual reference has not been added using the reference system.

      We thank the reviewer for noticing this mistake, which has now been corrected.

      Reviewer #3:

      The manuscript is overall clearly presented and the experimental data largely support the conclusions. However, there are a number of issues that need to be addressed to improve the clarity of the paper.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened and improved the clarity of the manuscript.

      Specific comments:

      (1) Title: The terms "transient neonatal diabetes" and "glucose dyshomeostasis in adults" are used to describe the TALK-1 L114P mutant mice. Transient neonatal diabetes gives the impression that diabetes is resolved during the neonatal period. The authors should clarify the criteria used for transient neonatal diabetes, and the difference between glucose dyshomeostasis and MODY. Longitudinal plasma glucose and insulin data would be very informative and help readers to follow the authors' narrative.

      We appreciate the helpful comment and have added longitudinal plasma glucose from neonatal mice to address this (see Supplemental figure 2). The new data now shows the TALK-1 L114P mutant mice undergo transient hyperglycemia that resolves by p10 and then occurs again at week 15. Insulin secretion from P4 islets is also included that shows that male animals homozygous for the TALK-1 L114P mutation have the largest impairment in glucosestimulated insulin secretion, followed by male heterozygous TALK-1 L114P P4 islets that also have impaired insulin secretion (see Figure 1). The amount of hyperglycemia correlates with the defects in neonatal islet insulin secretion.

      (2) Another concern for the title is the term "α-cell overactivity." This could be taken to mean that individual α-cells are more active and/or that there are more α-cells to secrete glucagon. The study does not provide direct evidence that individual α-cells are more active. This should be clarified.

      We appreciate the helpful comment and have revised the manuscript title accordingly.

      (3) In the Introduction, it is stated that because TALK-1 activity is voltage-dependent, the GOF mutation is less likely to cause neonatal diabetes, yet the study shows the L114P TALK-1 mutation actually causes neonatal diabetes by completely abolishing glucose-stimulated Ca2+ entry. This seems to imply TALK-1 activity (either in the plasma membrane or ER membrane) has more impact on Vm or cytosolic Ca2+ in neonates than initially predicted. Some discussion on this point is warranted.

      These are important points and we have added details to the discussion about this. For example, the discussion now states that, “This suggests a greater impact of TALK-1 L114P in neonatal islets compared to adult islets. Future studies during β-cell maturation are required to determine if TALK-1 activity is greater on the plasma membrane and/or ER membrane compared with adult β-cells.” The introduction has also been revised to clarify the voltagedependence of TALK-1.

      (4) What is the relative contribution of defects in plasma membrane depolarization versus ER Ca2+ handling on defective insulin secretion response?

      We thank the reviewer for bringing up this important point. TALK-1 L114P islets show blunted glucose-stimulated depolarization and glucose-stimulated calcium entry, however, the L114P islets show equivalent Ca2+ entry as control islets in response high KCl (Figure 5GH). As the KCl stimulated Ca2+ influx is similar between control and TALK-1 L11P islets, this indicates that plasma membrane TALK-1 L114P has a hyperpolarizing role that significantly blunts glucose-stimulated depolarization and reduces activation of voltage-dependent calcium channels. We have further tested this by looking at glucose-stimulated β-cell membrane potential depolarization in TALK-1 L11P islets, which is significantly blunted (Figure4 A and B; Supplemental figure 6). However, 33% of TALK-1 L11P β-cells showed glucose-stimulated electrical excitability (Supplemental figure 6), which likely accounts for the modest GSIS from TALK-1 L11P islets. New data has also been included showing that KCl stimulation causes a significant depolarization of β-cells from TALK-1 L11P islets (Supplemental figure 6). Because plasma membrane TALK-1 L114P is largely responsible for the hyperpolarized membrane potential and blunted glucose-stimulated Ca2+ entry, this suggests that TALK-1 L11P on the plasma membrane is primarily responsible for the altered insulin secretion. The discussion has been revised to reflect this.

      (5) The Jacobson group has previously shown that another K2P channel TASK-1 is also involved in ER Ca2+ homeostasis and that TASK inhibitors restored ER Ca2+ in TASK-1 expressing cells. Is TASK-1 expressed in β-cell ER membrane? Can the mishandling of Ca2+ caused by TALK-1 L114P be reversed by TASK-1 inhibitors?

      We thank the reviewer for bringing up this important point in relation to ER calcium handling by K2P channels. We have found that TASK-1 channels expressed in alpha-cells enhance ER calcium release and that inhibitors or TASK-1 channels elevate alpha-cell ER calcium storage. We did not observe any significant changes in the gene (Kcnk3) encoding TASK-1 between islets from control or TALK-1 L11P mice, which has now been added to the manuscript. However, because the TALK-1 L11P-mediated reduction of glucose-stimulated depolarization and inhibition of calcium entry are both prevented in the presence of high KCl (see Figure X); this strongly suggests that TALK-1 L114P K+ flux at the membrane is hyperpolarizing the membrane potential and limiting depolarization and calcium entry. This suggests that TALK-1 L114P control of ER calcium handling is not the primary contributor to the blunted glucose-stimulate calcium handling. Furthermore, acetylcholine stimulation of islets from both control and TALK-1 L114P islets elicited ER calcium release, which indicates that for the most part ER calcium release is still responsive to cues that control release, but they are altered. Taken together this suggests that the TALK-1 L114P impact on ER calcium is not the primary mediator of blunted glucose-stimulated islet calcium entry and insulin secretion.

      (6) The electrical recording experiments were conducted using whole islets. The authors should comment on how the cells were identified as β-cells, especially in mutant islets in which there is an increased number of α-cells.

      The reviewer brings up an important point. As indicated, the original membrane potential recordings were conducted using whole islets. While the recorded cells could mostly be βcells based on mouse islets typically containing >80% β-cells, there is a possibility that some of the cells included in these recordings were α-cells or δ-cells (especially because of the noted α-cell hyperplasia in TALK-1 L114P islets). Thus, we have now included data from bcells that were identified with an adenoviral construct containing a rat insulin promoter driving a fluorescent reporter. This allowed the fluorescent β-cells to be monitored with electrophysiological membrane potential recordings. The new data (see Supplemental figure 6) shows a significant reduction in glucose-stimulated depolarization in 67% of β-cells with the L114P mutation compared to controls.

      Minor:

      (1) Some references need formatting.

      The references have been revised accordingly.

      (2) Please define glucose-stimulated phase 0 Ca2+ response for non-expert readers.

      This has been defined accordingly.

      (3) Page 14 bottom: The sentence "Unlike the only other MODY-associated.........., TALK-1 is not inhibited by sulfonylureas" seems out of place and lacks context.

      We thank the reviewer for this suggestion and have deleted this sentence.

      (4) Figure 6: It would be helpful to provide a protein name for the genes shown in panel D.

      The protein names for the genes have now been included in the discussion of these genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the thoughtful review of our manuscript by the reviewers, along with their valuable suggestions for enhancing our work. In response to these suggestions, we conducted additional experiments and made significant revisions to both the text and figures. In the following sections, we first highlight the major changes made to the manuscript, and thereafter address each reviewer's comments point-by-point. We hope these additional data and revisions have improved the robustness and clarity of the study and manuscript. Please note that as part of a suggested revision we have changed the manuscript title to be: Bacterial vampirism mediated through taxis to serum.

      Major revisions and new data:

      (1) We conducted additional experiments testing taxis to serum using a swine ex vivo enterohemorrhagic lesion model in which we competed wildtype versus chemotaxis deficient strains (Fig. 8). We selected swine for these experiments due to their similarity in gastrointestinal physiology to humans. In these experiments we see that chemotaxis, and the chemoreceptor Tsr, mediate localization to, and migration into, the lesion. We also tested, and confirmed, taxis to serum from swine and serum from horse, that supporting that serum attraction is relevant in other host-pathogen systems.

      (2) We present additional experimental data and quantification of chemotaxis responses to human serum treated with serine-racemase (Fig. S3). This treatment reduces wildtype chemoattraction and the wildtype no longer possesses an advantage over the tsr strain, providing further evidence that L-serine is the specific chemoattractant responsible for Tsr-mediated attraction to serum.

      (3) We present additional data in the form of 17 videos of chemotaxis experiments with norepinephrine and DHMA showing null-responses under various conditions. These data provide additional support to the conclusion that these chemicals are not responsible for bacterial attraction to serum. We have included these raw data as a new supplementary file (Data S1) for those in the field that are interested in these chemicals.

      (4) Based on comments from Reviewer 2 regarding whether the position of the ligand and ligand-binding site residues in the previously-reported EcTsr LBD structure are incorrect, or whether these differences are due to the proteins being from different organisms, we performed paired crystallographic refinements to determine which positions result in model improvement (Fig. 7J). Altering the EcTsr structure to have the ligand and ligandbinding site positions from our new higher resolution and better-resolved structure of Salmonella Typhimurium Tsr results in a demonstrably better model, with both Rwork and Rfree lower by about 1% (Fig. 7J). These data support our conclusion that the correct positions for both structures are as we have modeled them in the S. Typhimurium Tsr structure. We also solved an additional crystal structure of SeTsr LBD captured at neutral pH (7-7.5) that confirms our structure captured with elevated pH (7.5-9.7) has no major changes in structure or ligand-binding interactions (Fig. S6, Table S2).

      (5) Based on comments from Reviewer 2 on the accuracy of the diffusion calculations, we present a new analysis (Fig. S2) comparing the experimentally-determined diffusion of A488 compared to its calculated diffusion. We found that:

      [line 111]: “As a test case of the accuracy of the microgradient modeling, we compared our calculated values for A488 diffusion to the normalized fluorescence intensity at time 120 s. We determined the concentration to be accurate within 5% over the distance range 70270 µm (Fig. S2). At smaller distances (<70 µm) the measured concentration is approximately 10% lower than that predicted by the computation. This could be due to advection effects near the injection site that would tend to enhance the effective local diffusion rate.”

      (6) Both reviewers asked us to better justify why we focused on the chemoreceptor Tsr, and had questions about why we did not investigate Tar. The low concentration of Asp in serum suggests Tar could have some effect, but less so than Trg or Tsr (see Fig. 4A). We have revised the text throughout to better convey that we agree multiple chemoreceptors are involved in the response and clarify our rationale for studying the role of Tsr:

      [line 178]: “We modeled the local concentration profile of these effectors based on their typical concentrations in human serum (Fig. 4B). Of these, by far the two most prevalent chemoattractants in serum are glucose (5 mM) and L-serine (100-300 µM) (Fig. 4B-F). This suggested to us that the chemoreceptors Trg and/or Tsr could play important roles in serum attraction.”

      [line 186]: “Since tsr mutation diminishes serum attraction but does not eliminate it, we conclude that multiple chemoattractant signals and chemoreceptors mediate taxis to serum. To further understand the mechanism of this behavior we chose to focus on Tsr as a representative chemoreceptor involved in the response, presuming that serum taxis involves one, or more, of the chemoattractants recognized by Tsr that is present in serum: L-serine, NE, or DHMA.”

      [line 468] “Serum taxis occurs through the cooperative action of multiple bacterial chemoreceptors that perceive several chemoattractant stimuli within serum, one of these being the chemoreceptor Tsr through recognition of L-serine (Fig. 4).”

      Point-by-point responses to reviewer comments:

      Reviewer #1:

      (1) Presumably in the stomach, any escaping serum will be removed/diluted/washed away quite promptly? This effect is not captured by the CIRA assay but perhaps it might be worth commenting on how this might influence the response in vivo. Perhaps this could explain why, even though the chemotaxis appears rapid and robust, cases of sepsis are thankfully relatively rare.

      To clarify, the Enterobacteriaceae species we have tested here are colonizers of the intestines, not the stomach, and cases of bacteremia from these species are presumably due to bloodstream entry through intestinal lesions. Whether or not intestinal flow acts as a barrier to bloodstream entry is not something we test here, and so we have not commented on this idea in the manuscript. We do demonstrate that attraction to serum occurs within seconds-to-minutes of exposure. We expect that the major protective effects against sepsis are the host antibacterial factors in serum, which are well-described in other work. We have been careful to state throughout the text that we see attraction responses, and growth benefits, to serum that is diluted in an aqueous media, which is different than bacterial growth in 100% serum or in the bloodstream.

      (2) The authors refer to human serum as a chemoattractant numerous times throughout the study (including in the title). As the authors acknowledge, human serum is a complex mixture and different components of it may act as chemoattractants, chemo-repellents (particularly those with bactericidal activities) or may elicit other changes in motility (e.g. chemokinesis). The authors present convincing evidence that cells are attracted to serine within human serum - which is already a well-known bacterial chemoattractant. Indeed, their ability to elucidate specific elements of serum that influence bacterial motility is a real strength of the study. However, human serum itself is not a chemoattractant and this claim should be re-phrased - bacteria migrate towards human serum, driven at least in part by chemotaxis towards serine.

      Throughout the text we have changed these statements, including in the title, to either be ‘taxis to serum’ or ‘serum attraction.’ On the timescales we tested our data support that chemotaxis, not chemokineses or other forms of direction motility, is what drives rapid serum attraction, since a motile but non-chemotactic cheY mutant cannot localize to serum (Fig. 4). We present evidence of one of these chemotactic interactions (L-Ser).

      (3) Linked to the previous point, several bacterial species (including E. coli - one of the bacterial species investigated here) are capable of osmotaxis (moving up or down gradients in osmolality). Whilst chemotaxis to serine is important here, could movement up the osmotic gradient generated by serum injection play a more general role? It could be interesting to measure the osmolality of the injected serum and test whether other solutions with similar osmolality elicit a similar migratory response. Another important control here would be to treat human serum with serine racemase and observe how this impacts bacterial migration.

      As addressed above, we have added additional experiments of serum taxis treated with serine racemase showing competition between WT and cheY, and WT and tsr (Fig. S3). These data support a role for L-serine as a chemoattractant driving attraction to serum. The idea of osmotaxis is interesting, but outside the scope of this work since we focus on chemoattraction to L-serine as one of the mechanisms driving serum attraction, and have multiple lines of evidence to support that.

      (4) The migratory response of E. coli looks striking when quantified (Fig. 6C) but is really unclear from looking at Panel B - it would be more convincing if an explanation was offered for why these images look so much less striking than analogous images for other species (E.g. Fig. 6A).

      We agree that the E. coli taxis to serum response is less obvious. We have brightened those panels to hopefully make it clearer to interpret (more cells in field of view over time). Also, as stated in the y-axes of these plots, this quantification was performed by enumerating the number of cells in the field of view, and the Citrobacter and Escherichia responses are shown on separate y-axes (now Fig. 8C). As indicated, the experiments have different numbers of starting motile cells, which we presume accounts for the difference in attraction magnitude. When investigating diverse bacterial systems we found there to be differences in motility under the culturing and experimental conditions we employed, for multiple reasons, and so for these data we thought it best to report raw cell numbers rather data normalized to the starting number of bacteria, as we do elsewhere. In the specific case of these E. coli responding to serum, please view Supplementary Movie S3, which both clearly shows the attraction response and that the bacteria grew in a longer, semi-filamentous form that seem to impair their swimming speed.

      (5) It is unclear why the fold-change in bacterial distribution shows an approximately Gaussian shape with a peak at a radial distance of between 50 -100 um from the source (see for example Fig. 2H). Initially, I thought that maybe this was due to the presence of the microcapillary needle at the source, but the CheY distribution looks completely flat (Fig. 3I). Is this an artifact of how the fold-change is being calculated? Certainly, it doesn't seem to support the authors' claim that cells increase in density to a point of saturation at the source. Furthermore, it also seems inappropriate to apply a linear fit to these non-linear distributions (as is done in Fig. 2H and in the many analogous figures throughout the manuscript).

      We have revised the text to address this point, and removed the comment about cells increasing in density to a point of saturation: [Line 138] “We noted that in some experiments the population peak is 50-75 µm from the source, possibly due to a compromise between achieving proximity to nutrients in the serum and avoidance of bactericidal serum elements, but this behavior was not consistent across all experiments. Overall, our data show S. enterica serovars that cause disease in humans are exquisitely sensitive to human serum, responding to femtoliter quantities as an attractant, and that distinct reorganization at the population level occurs within minutes of exposure (Fig. 3, Movie 2).”

      We can confirm that this is not an artifact of quantification. Please refer to the videos of these responses, which demonstrates this point (Movies 1-5).

      (6) The authors present several experiments where strains/ serovars competed against each other in these chemotaxis assays. As mentioned, these are a real strength of the study - however, their utility is not always clear. These experiments are useful for studying the effects of competition between bacteria with different abilities to climb gradients.

      However, to meaningfully interpret these effects, it is first necessary to understand how the different bacteria climb gradients in monoculture. As such, it would be instructive to provide monoculture data alongside these co-culture competition experiments.

      Thank you for this suggestion. We agree that the coculture experiments showing strains competing for the same source of effector give a different perspective than monoculture. These experiments allow us to confirm taxis deficiencies or advantages with greater sensitivity, and ensure that the bacteria in competition have experienced the same gradient. This type of competition experiment is often used in in vivo experimentation for the same advantages. We note that in the gut the bacteria are not in monoculture and chemotactic bacteria do have to compete against each other for access to nutrients. Repeating all of the experiments we present to show both the taxis responses in coculture and monoculture would be an extraordinary amount of work that we do not believe would meaningfully change the conclusions of this study.

      (7) Linked to the above point, it would be especially instructive to test a tsr mutant's response in monoculture. Comparing the bottom row of Fig. 3G to Fig. 3I suggests that when in co-culture with a cheY mutant, the tsr mutant shows a higher fold-change in radial distribution than the WT strain. Fig. 4G shows that a tsr mutant can chemotaxis towards aspartate at a similar, but reduced rate to WT. This could imply that (like the trg mutant), a tsr mutant has a more general motility defect (e.g. a speed defect), which could explain why it loses out when in competition with the WT in gradients of human serum, but actually seems to migrate strongly to human serum when in co-culture with a cheY mutant. This should be resolved by studying the response of a tsr mutant in monoculture.

      Addressed above.

      (8) In Fig. 4, the response of the three clinical serovars to serine gradients appears stronger than the lab serovar, whilst in Fig. 1, the response to human serum gradients shows the opposite trend with the lab serovar apparently showing the strongest response. Can the authors offer a possible explanation for these slightly confusing trends?

      We suspect this relates to the fact that pure L-serine is a chemoattractant, whereas treatment with serum exposes the bacteria both to chemoattractants and, likely, chemorepellents. Strains may navigate the landscape of these stimuli different for a variety of reasons that are not simple to tease apart. The final magnitude of change in bacterial localization depends on multiple factors including swimming speed, adaptation, sensitivity of chemoattraction, and cooperative signaling of the chemoreceptor nanoarray. Thus, we cannot state with certainty how and why these strains are different across all experiments, but we can state that they are attracted to both serum and L-serine.

      (9) In Fig. S2, it seems important to present quantification of the effect of serine racemase and the reported lack of response to NE and DHMA - the single time-point images shown here are not easy to interpret.

      As suggested, we present quantification of the serum racemase treated samples (now Fig. S3). To assist in the interpretation of this max projections Fig. S3 now noted the chemotactic response (chemoattraction for L-serine, null-response for NE/DHMA). Further, we revised the text to state: [line 209: “We observed robust chemoattraction responses to L-serine, evident by the accumulation of cells toward the treatment source (Fig. S3E, Movie 4), but no response to NE or DHMA, with the cells remaining randomly distributed even after 5 minutes of exposure (Fig. S3F-I, Movie 5, Movie S1).”

      (10) Importantly, the authors detail how they controlled for the effects of pH and fluid flow (Line 133-136). Did the authors carry out similar controls for the dual-species experiments where fluorescent imaging could have significantly heated the fluid droplet driving stronger flow forces?

      Most of our microfluidics experiments were performed in a temperature-controlled chamber (see Methods). Since the strains in the coculture experiments experienced the same experimental conditions we have no evidence of fluorescence-imaginginduced temperature changes that have impacted whether or not the bacteria are attracted to serum or the effectors we investigated.

      (11) The inference of the authors' genetic analysis combined with the migratory response of E. coli and C. koseri to human serum shown in Fig. 6 is that Tsr drives movement towards human serum across a range of Enterobacteriaceae species. The evidence for the importance of Tsr here is currently correlative - more causal evidence could be presented by either studying the response of tsr mutants in these two species (certainly these should be readily available for E. coli) or by studying the response of these two species to serine gradients.

      We have revised the text to state: [line 402] “Without further genetic analyses in these strain backgrounds, the evidence for Tsr mediating serum taxis for these bacteria remains circumstantial. Nevertheless, taxis to serum appears to be a behavior shared by diverse Enterobacteriaceae species and perhaps also Gammaproteobacteria priority pathogen genera that possess Tsr such as Serratia, Providencia, Morganella, and Proteus (Fig. 8B).”

      We note that other work has thoroughly investigated E. coli serine taxis.

      Figure Suggestions

      (1) Fig. 2 - The inset bar charts in panels H-J and the font size in their axes labels are too small - this suggestion also applies to all analogous figures throughout the manuscript.

      We have increased the size of the text for these inset plots. We have also broken up some of the larger figures.

      (2) Panel 2F - the cartoon bacterial cell and 'number of bacteria' are confusing and seem to contradict the y-axis label. This also applies to several other figures throughout the manuscript where the significance of this cartoon cell is quite hard to interpret.

      As suggested, we have removed this cartoon.

      (3) Panels G-I in Fig. 3 are currently tricky to interpret - it would be easier if the authors were to use three different colours for the three different strains shown across these panels.

      We have broken up Figure 2 (which also had these types of plots) so that hopefully these labels are more clear. For the Figure in question (now Fig. 4), due to the many figures and different types of data and comparisons it was difficult to find a color scheme for these strains that would be consistent across the manuscript. These colors also reflect the fluorescence markers. We note that not only do we use color to indicate the strain but also text labels.

      (4) Panels 3B-F would be best moved to a supplementary figure as this figure is currently very busy. Similarly, I would potentially consider presenting only the bottom row of panels in Panels G-I in the main figure (which would then be consistent with analogous data presented elsewhere).

      We have opted to keep these panels in the main text (now Fig. 4) as they are relevant to understanding (1) our justification for why to pursue certain chemoeffector-chemoreceptor interactions and not others, and (2) how the chemoattraction response can be understood both in terms of bacterial population distribution and relevant cells over time.

      (5) Fig. 4 and possibly elsewhere - perhaps best not to use Ser as an abbreviation for Serine here because it could potentially be confused with an abbreviation for serum.

      It is unfortunate that these two words are so similar. However, Ser is the canonical abbreviation for the amino acid serine. Serum does not have a canonical abbreviation.

      (6) Fig. 4 - I would move panels H - K to a separate supplementary figure - currently, they are too squished together and it is hard to make out the x-axis labels. I would also consider moving panels E-G to supplementary as well so that the microscopy images presented elsewhere in the figure can be presented at an appropriate size.

      Since we are allowed more figures, we could also break some of these figures up into multiple ones.

      (7) Similarly, I would move some panels from Fig. 5 to supplementary as the figure is currently quite busy.

      We have rearranged the figure (now Fig. 7) to move the bioinformatics data to Fig. 8 to allow more space for the panels.

      Other suggestions

      (8) Line 179 - how do the concentrations quote for serine and glucose compare to aspartate? This would be helpful to justify the authors' decision not to investigate Tar as a potential chemoreceptor.

      This is addressed in our comments above and in Fig. 4A and Fig. 4B-F. Human serum L-Asp is much lower concentration (about 20-fold).

      (9) Line 282 - Serine levels in serum are quantified at 241 uM, but this is only discussed in the context of serum growth effects. Could this information be better used to design/ inform the serine gradients that were tested in chemotaxis assays?

      We tested a wide range of serine concentrations and show even much lower sources of serine than is present in serum is sufficient for chemoattraction. Also, the K1/2 for serine is 105 uM (Fig. S4), which is surpassed by the concentration in serum (Fig. S5).

      (10) The word 'potent' in the title might be too vague, especially as the strength of the response varies between strains/species. It may perhaps be more useful to focus on the rapidity/sensitivity of the response. However, presumably the sensitivity of the response will be driven by the sensitivity of the response to serine (which is already known for E. coli at least). Also, as noted in the public review, human serum itself is not a chemoattractant so I would consider re-phasing this in the title and elsewhere.

      As suggested, and discussed above, we have implemented this change.

      (11) Typo line 59 'context of colonizing of a healthy gut'.

      Addressed.

      (12) Typo line 538 - there is an extra full stop here.

      Addressed.

      Reviewer #2:

      (1) This study is well executed and the experiments are clearly presented. These novel chemotaxis assays provide advantages in terms of temporal resolution and the ability to detect responses from small concentrations. That said, it is perhaps not surprising these bacteria respond to serum as it is known to contain high levels of known chemoattractants, serine certainly, but also aspartate. In fact, the bacteria are shown to respond to aspartate and the tsr mutant is still chemotactic. The authors do not adequately support their decision to focus exclusively on the Tsr receptor. Tsr is one of the chemoreceptors responsible for observed attraction to serum, but perhaps, not the receptor. Furthermore, the verification of chemotaxis to serum is a useful finding, but the work does not establish the physiological relevance of the behavior or associate it with any type of disease progression. I would expect that a majority of chemotactic bacteria would be attracted to it under some conditions. Hence the impact of this finding on the chemotaxis or medical fields is uncertain.

      We agree that the data we show are mostly mechanistic and further work is required to learn whether this bacterial behavior is relevant in vivo and during infections. We present new data using an ex vivo intestinal model which supports the feasibility of serum taxis mediating invasion of enterohemorrhagic lesions (Fig. 8).

      (2) The authors also state that "Our inability to substantiate a structure-function relationship for NE/DHMA signaling indicates these neurotransmitters are not ligands of Tsr." Both norepinephrine (NE) and DHMA have been shown previously by other groups to be strong chemoattractants for E. coli (Ec), and this behavior was mediated by Tsr (e.g. single residue changes in the Tsr binding pocket block the response). Given the 82% sequence identity between the Se and Ec Tsr, this finding is unexpected (and potentially quite interesting). To validate this contradictory result the authors should test E. coli chemotaxis to DHMA in their assay. It may be possible that Ec responds to NE and DHMA and Se doesn't. However, currently, the data is not strong enough to rule out Tsr as a receptor to these ligands in all cases. At the very least the supporting data for Tsr being a receptor for NE/DHMA needs to be discussed.

      Addressed above. The focus of this study is serum attraction and the mechanisms thereof. We never saw any evidence to support the idea that NE/DHMA drives attraction to serum, nor are chemoeffectors for Salmonella, and provide these null-results in Data S2.

      (3) The authors also determine a crystal structure of the Se Tsr periplasmic ligand binding domain bound to L-Ser and note that the orientation of the ligand is different than that modeled in a previously determined structure of lower resolution. I agree that the SeTsr ligand binding mode in the new structure is well-defined and unambiguous, but I think it is too strong to imply that the pose of the ligand in the previous structure is wrong. The two conformations are in fact quite similar to one another and the resolution of the older structure, is, in my view, insufficient to distinguish them. It is possible that there are real differences between the two structures. The domains do have different sequences and, moreover, the crystal forms and cryo-cooling conditions are different in each case. It's become increasingly apparent that temperature, as manifested in differential cooling conditions here, can affect ligand binding modes. It's also notable that full-length MCPs show negative cooperativity in binding ligands, which is typically lost in the isolated periplasmic domains. Hence ligand binding is sensitive to the environment of a given domain. In short, the current data is not convincing enough to say that a previous "misconception" is being corrected.

      Thank you for this comment, which spurred us to investigate this idea more rigorously. As described above we performed new refinements of the E. coli structure edited to have the positions of the ligand and ligand-binding site as modeled in our new Tsr structure from Salmonella (Fig. 7J). The best model is obtained with these poses. Along with the poor fit of the E. coli model to the density, the best interpretations for these positions, for both structures, are as we have modeled them in the Salmonella Tsr structures.

      Figure suggestions

      (1) Figure 2 looks busy and unorganized. Fig 2C could be condensed into one image where there are different colored rings coming from the source point that represent different time points.

      Addressed above. Fig. 2 has been broken apart to help improve clarity.

      (2) What is the second (bottom) graph of 2D? I think only the top graph is necessary.

      We have added an explanation to the figure legend that the top graph shows the means and the bottom shows SEM. The plots cannot easily be overlaid.

      (3) Similarly, Fig 2E doesn't need to have so many time points. Perhaps 4 at maximum.

      As the development of the response over time is a key take-home of the study, we do not wish to reduce the timepoints shown.

      (4) The legend for Figure 2F uses the unit 'µM' to mean micrometers but should use 'µm'.

      Corrected.

      (5) In Figures 2H-J, the lime green text is difficult to read. The word "serum" does not need to be at the top of each panel. I recommend shortening the y-axis titles on the graphs so you can make the graphs themselves larger.

      Addressed above.

      (6) In Figures 2H-J, I am confused about what is being shown in the inset graph. The legend says it's the AUC for the data shown. However, in the third panel (S. Typhimurium vs. S. Enteriditus) the data appears to be much more disparate than the inset indicates. I don't think that this inset is necessary either.

      The point of this inset graph is to quantify the response through integration of the curve, i.e., area under the curve, which is a common way to quantify complex curves and compare responses as single values. We are using this method to calculate statistical significant of the response compared to a null response. We have added further clarification to the figure legend regarding these plots: Inset plots show foldchange AUC of strains in the same experiment relative to an expected baseline of 1 (no change). p-values shown are calculated with an unpaired two-sided t-test comparing the means of the two strains, or one-sided t-test to assess statistical significance in terms of change from 1-fold (stars).

      (7) Line 154, change "relevant for" to "observed in".

      Changed.

      (8) Line 171, according to the Mist4 database, Salmonella enterica has seven chemoreceptors. Why are only Tar, Tsr, and Trg mentioned? Why were only Tsr and Trg tested?

      Addressed above.

      (9) Line 192, be clear that you are referring to genes and not proteins, as italics are used.

      Revised to make this distinction clear.

      (10) Line 193, have other studies found a Trg deletion strain to be non-chemotactic? If so, cite this source here.

      We state that the Trg deletion strain had deficiencies in motility, and also have revised the text to include the clarification that this was not noted in earlier work with this strain: [line 173]: We were surprised to find that the trg strain had deficiencies in swimming motility (data not shown). This was not noted in earlier work but could explain the severe infection disadvantage of this mutant 34. Because motility is a prerequisite for chemotaxis, we chose not to study the trg mutant further, and instead focused our investigations on Tsr.

      (11) Why wasn't a Tar deletion mutant also analyzed? The authors say that based on the known composition of serum, serine and glucose are the most abundant. However, the serum does have aspartate at 10s of micromolar concentrations.

      Addressed above.

      (12) “The Tsr deletion strain still exhibits an obvious chemoattraction to serum. There are other protein(s) involved in chemoattraction to serum but the text does not discuss this.”

      Addressed above.

      (13) “In Figure 3B-F, the text is very difficult to read even when zoomed in on.”

      We have increased the font size of these panels.

      (14) “All of the text in Figure 5 is extremely small and difficult to read.”

      Addressed above. We split this figure in two to help improve clarity.

      (15) “I wonder about the accuracy of the concentration modeling. It seems like there are a lot of variables that could affect the diffusion rates, including the accuracy of the delivery system. Could the concentrations be verified by the dye experiments?”

      Addressed above. We provide a new analysis comparing experimental diffusion of A488 dye compared to calculations (Fig. S2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) It is nice that the authors compared their model to the one "without lookahead" in Figure 4, but this comparison requires more evidence in my opinion, as I explain in this comment. The model without lookahead is closely related or possibly equivalent to the standard predictive coding. In predictive coding, one can make the network follow the stimulus rapidly by reducing the time constant tau. However, as the time constant decreases, the network would become unstable both in simulations (due to limited integration time step) and physical implementation (due to noise). Therefore I wonder if the proposed model has an advantage over standard predictive coding with an optimized time constant. Hence I suggest to also add a comparison between the proposed model, and the predictive coding with parameters (such as tau) optimized independently for each model. Of course, we know that the time-constant of biological neurons is fixed, but biological neurons might have had different time constants (by changing leak conductance) and such analysis could shed light on the question of why the neurons are organized the way they are.

      The comparison with a predictive network for which the neuronal time constants shrink towards 0 is in fact helpful. We added two news subsections in the SI that formally compares the NLA with other approaches, Equilibrium propagation and the Latent Equilibrium, with a version of Equilibrium Propagation also covering the standard predictive coding you describe (SI, Sect.C and D). The Subsection C concludes: “In the Equilibrium propagation we cannot simply take the limit t0 since then the dynamics either disappears (when tau remains on the left, t Du  0) or explodes (when t is moved to the right, dt/ t  ∞), leading to either too small or too big jumps.”

      We have also expanded the passage on the predictive coding in the main text, comparing our instantaneous network processing (up to a remaining time constant tin) with experimental data from humans (see page 10 of the revised ms). The new paragraph ends with:

      “Notice that, from a technical perspective, making the time constants of individual cortical neurons arbitrarily short leads to network instabilities and is unlikely the option chosen by the brain (see SI Sect. C, Comparison to the Equilibrium Propagation).”

      A new formal definition of the moving equilibrium in the Methods (Sect. F) helps to understand this notion of being in a balanced equilibrium state during the dynamics. This formal definition directly leads to the contraction analysis in the SI, Sect. D, showing why the Latent Equilibrium is always contractive, while the current form of the NLA may show jumps at the corner of a ReLu (since a second order derivative of the transfer function enters in the error propagation).

      The reviewer perhaps has additional simulations in mind that compare the robustness of the different models. However, as this paper is more about presenting a novel concept with a comprehensive theory (summing up to 45 pages), we prefer to not add more than the simulations necessary to check the statements of the theorems.

      (2) I found this paper difficult to follow, because the Results sections went straight into details, and various elements of the model were introduced without explaining why they are necessary. Furthermore, the neural implementation was introduced after the model simulations. I suggest reorganizing the manuscript, to describe the model following Marr's levels of description and then presenting the results of simulations. In particular, I suggest starting the Results section by explaining what computation the network is trying to achieve (describe the setup, function L, define its integral over time, and explain that the goal is to find a model minimizing this integral). Then, I suggest presenting the algorithm the neurons need to employ to minimize this integral, i.e. their dynamics and plasticity (I wonder if r=rho(u) + tau rho(u)' is a consequence of action minimization or a necessary assumption - please clarify it). Next please explain how the algorithms could be implemented in biological neurons. Afterward please present the results of the simulation.

      We are sorry to realize that we could not convey the main message clearly enough. After rewriting the paper and straightening the narrative, we hope it is simpler to understand now.

      The paper does not suggest a new model to solve a task, and writing down the function to be minimized is not enough. The point of the NLA is that the time integral of our Lagrangian is minimized with respect to the prospective coordinates, i.e. the discounted future voltage. It is about the question how dynamic equations in biology are derived. Of course, we also solve these equations, prove theorems and perform simulations. But the main point that biology seems to deal with time differently than physics deals with time. Biology “thinks” in terms of future quantities, physics “thinks” in terms of current quantities. We tried to explain this better now in the Introduction, the Results (e.g. after Eq. 5) and the Methods.

      (3) Understanding the paper requires background knowledge that most readers of eLife are unlikely to have, even if they are mathematically minded. For example, I am from the field of computational neuroscience, and I have never heard about Least Action principle from physics or the EulerLagrange equation. I felt lost after reading this paper, and to be able to write this review I needed to watch videos on the Euler-Lagrange equation. To help other readers, I have two suggestions: First, I feel that Eq 4-6 could be moved to the methods, because I found the concept of u~ difficult to understand, and it does not appear in the algorithm. Second, I advise to write in the Introduction, what knowledge is required to follow this paper, and point the readers to resources where they can find the required information. The authors may specify what background is required to follow the main text, and what is required to understand the methods.

      We hope that after explaining the rationale better, it becomes clear that we cannot skip the equations for the prospective coordinates. Likewise, the Euler-Lagrange equations need to be presented in the abstract form, since these are the equations that are eventually transformed into the “model”. We tried to give the basic intuition for this in the main text. As we explained above, the equations asked to be skipped represent the essence of the proposal. It is about how to derive a model equations.

      Moreover, we give more explanations in the Methods to understand the derivations, and we refer to the specifically sections in the SI for further details. We are aware that a full understanding of the theory requires some basic knowledge of the calculus of variation.

      We are hesitating to write in the Introduction what type of knowledge is required to understand the paper. An understanding can be on various levels. Moreover, the materials that are considered to be helpful depend on the background. While for some it is a Youtube, for some Wikipedia, and for others it is a textbook where specific ingredients can be extracted. But we do cite two textbooks in the Results and more in the SI, Sect. F, when referring to the principle of least action in physics and the mathematics, including weblinks.

      Minor comments

      Eq.3: The Authors refer to this equation as a Lagrangian. Could you please clarify why? Is the logic to minimize the energy subject to a constraint that Cost = 0?

      Thanks for asking. The cost is not really a constraint, it is globally minimized, in parallel steps. We are explaining this right after Eq. 3. “We `prospectively' minimize L locally across a voltage trajectory, so that, as a consequence, the local synaptic plasticity for W will globally reduce the cost along the trajectory (Theorem 1 below).”

      We were adding two sentence that explain why this function in Eq. 3 is called a Lagrangian: “While in classical energy-based approaches L is called the total energy, we call it the `Lagrangian' because it will be integrated along real and virtual voltage trajectories as done in variational calculus (leading to the Euler-Lagrange equations, see below and SI, Sect. F)”

      p.4, below Eq. 5 - Please explain the rationale behind NLA, i.e. why is it beneficial that "the trajectory u˜(t) keeps the action A stationary with respect to small variations δu˜"? I guess you wish to minimize L integrated over time, but this is not evident from the text.

      Hmm, yes and no. We wish to minimize the cost, and on the way there minimize the action. Since the global minimization of C is technically difficult, one looks for stationary trajectory as defined in the cited sentence, while minimizing L with respect to W, to eventually minimize the cost.

      In the text we now explain after Eq. 5:

      “The motivation to search for a trajectory that keeps the action stationary is borrowed from physics. The motivation to search for a stationary trajectory by varying the near-future voltages ũ instead of u is assigned to the evolutionary pressure in biology to 'think ahead of time'. To not react too late, internal delays involved in the integration of external feedback need to be considered and eventually need to be overcome. In fact, only for the 'prospective coordinates' defined by looking ahead into the future, even when only virtually, will a real-time learning from feedback errors become possible (as expressed by our Theorems below).”

      Bottom of page 8. The authors say that in the case of single equilibrium and strong nudging the model reduced to the Least Control Principle. Does it also reduce to Predictive coding for supervised learning? If so, it would be helpful to state so.

      Yes, in this case the prediction error in the apical dendrite becomes the one of predictive coding. We are stating this now right at the end of the cited sentence:

      “In the case of strong nudging and a single steady-state equilibrium, the NLA principle reduces to the Least-Control Principle (Meulemans et al., 2022) that minimizes the mismatch energy E^M for a constant input and a constant target, with the apical prediction error becoming the prediction error from standard predictive coding (Rao & Ballard, 1999).”

      In the Discussion we also added a further point (iv) to compare the NLA principle with predictive coding. Both “improve” the sensory representation, but the NLA does in favor of an output, and the predictive coding in favor of the sensory prediction itself (see Discussion).

      Whenever you refer to supplementary materials, please specify the section, so it is easier for the reader to find it.

      Done. Sorry to not have done it earlier. We are now also indicate specific sections when referring to the Methods.

      Reviewer #2 (Recommendations For The Authors):

      There are no major issues with this article, but I have several considerations that I think would greatly improve the impact, clarity, and validity of the claims.

      (1) Unifying the narrative. There are many many ideas put forward in what feels like a deluge. While I appreciate the enthusiasm, as a reader I found it hard to understand what it was that the authors thought was the main breakthrough. For instance, the abstract, results, introduction, and discussion all seem to provide different answers to that question. The abstract seems to focus on the motor error idea. The introduction seems to focus on the novel prospective+predictive setup of the energy function. The discussion lists the different perks of the theory (delay compensation, moving equilibrium, microcircuit) without referring to the prospective+predictive setup of the energy function.

      Thanks much for these helpful hints. Yes, the paper became an agglomerate of many ideas, also own to the fact that we wish to show how the NLA principle can be applied to explain various phenomenology in neurosicence. We now simplified the narrative to this one point of providing a novel theoretical framework for neuroscience, and explaining why this is novel and why it “suddenly works” (the prospective minimization of the energy).

      As you can see from the dominating red in the revised pdf, we did fully rewrite Abstract, Introduction and Discussion under the narrative of the NLA and prospective coding.

      (2) Laying out the organization of the notation clearly. There are quite a few subtle distinctions of what is meant by the different weight matrices (omnibus matrix then input vs recurrent then layered architecture), different temporal horizon formalisms (bar, not bar, tilde), different operators (L, curly L, derivative version, integral version). These different levels are introduced on the fly, which makes it harder to grasp. The fact that there are many duplicate notations for the same quantities does not help the reader. For instance u_0 becomes equal to u_N at one point (above Eq 25). Another example is the constant flipping between integrated and 'current input' pictures. So laying out the multiple layers early, making a table or a figure for the notation, or sticking with one level would help convey the idea to a wide readership.

      Thanks for the hints. We included the table you suggested, but put it to the SI as it became a full page itself. We banned the curly L abbreviating the look-ahead operator.

      The “change of notation” you are alluding to is tricky, though. In a recurrent layer, the index of the output neuron is called o. In a forward network with N layer, the index of the output neurons becomes the last layer N. One has to introduce the layer index l anway for the deeper layers l < N, and we found it more consistent to explain that, while switching from the recurrent to the forward network, the voltage of the output layer becomes now u_o = u_N. There are more of these examples, like the weight matrix W splitting into a intrinsic network part W_net across which errors backpropagate, and a part conveying the input, W_in, that has to be excluded when writing the backpropagation formula for general networks. Again, in the case of the feedforward networks, the notation reduces to W_l, with index l coding for the layer. Presenting the general approach and a specific example may appear as we would duplicate notations – we haven’t found a solution here.

      (3) Separate the algorithm from the implementation level. I particularly struggled with separating the ideas that belonged to the algorithm level (cost function, optimization objectives) and the biophysics. The two are interwoven in a way that does not have to be. Particularly, some of the normative elements may be implemented by other types of biophysics than the authors have in mind. It is for this reason that I think that separating more clearly what belongs to the implementation and algorithm levels would help make the ideas more widely understood. On this point, a trigger point for me was the definition of the 'prospective input rates' e_i, which comes in the second paragraph.

      We are very sorry to have made you thinking that the 'prospective input rates' would be e_i. The prospective input rates are r_i. The misunderstanding likely appeared by an unclear formulation from our side that is now corrected (see first and second paragraph of the Results where we introduce r_i and e_i).

      From a biophysical perspective, it is quite arbitrary to define the input to be the difference between the basal input and the somatic (prospective) potential. It sounds like it comes from some unclear normative picture at this point. But the authors seem to have in mind to use the fact that the somatic potential is the sum of apical and basal input, that's the biophysical picture.

      We hope to have disentangled the normative and biophysical view in the 2nd and 3rd paragraph of the Results, respectively. We introduce the prospective error ei as abstract notion in the first paragraph, while explaining that it will be interpreted as somato-dendritic mismatch error in neuron I in the next paragraph. The second paragraph contains the biophysical details with the apical and basal morphology.

      (4) Experts and non-expert would appreciate an explanation of why/how the choice of state variables matters in the NLA. The prospective coding state variables cannot be said to be the naïve guess. Why does the simple u, dot{u} not work as state variables applied on the same energy function, as would be a naïve application of the Lagrangian ideas?

      We are very glad for this hint to present an intuition behind the variation of the action with respect to a prospective state, instead of the state itself. The simple L(u, dot{u}) does not work because one does not obtain the first-order voltage dynamics compatible with the biophysics. We made an effort to explain the intuition to non-experts and experts in an additional paragraph right after presenting the voltage and error dynamics (Eq. 7 on page 4).

      Here is how the paragraph starts (not displaying the formulas here):

      “From the point of view of theoretical physics, where the laws of motion derived from the least-action principle contain an acceleration term (as in Newton's law of motion, like … for a harmonic oscillator), one may wonder why no second-order time derivative appears in the NLA dynamics. As an intuitive example, consider driving into a bend. Looking ahead in time helps us to reduce the lateral acceleration by braking early enough, as opposed to braking only when the lateral acceleration is already present. This intuition is captured by minimizing the neuronal action A with respect to the discounted future voltages ũi instead of the instantaneous voltages ui.

      Keeping up an internal equilibrium in the presence of a changing environment requires to look ahead and compensate early for the predicted perturbations.

      Technically, …”

      More details are given in the Methods after Eq. 20. Moreover, in the last part of the SI, Sect. F, we have made the link to the least-action principle in physics more explicitly. There we show how the voltage dynamics can be derived from the physical least-action principle by including the Rayleigh dissipation (Eq. 92 and 95).

      (5) Specify that the learning rules have not been observed. Though the learning rules are Hebbian, the details of the rules have not to my knowledge been observed. Would be worth mentioning as this is a sticking point of most related theories.

      We agree, and we do now explicitly write in the Discussion that the learning rule still awaits to be experimentally tested.

      6) Some relevant literature. Chalk et al. PNAS (2018) have explored the relationship between temporal predictive coding and Rao & Ballard predictive coding based on the parameters of the cost function. Harkin et al. eLife (2023) have shown that 'prospective coding' also takes place in the serotonergic system, while Kim ... Ma (2021) have put forward similar ideas for dopamine, both may participate in setting the cost function. Instantaneous voltage propagation is also a focus of Greedy et al. (2023). The authors cite Zenke et al. for spiking error propagation, but there are biological references to that end.

      Thanks much for these hints. We do now cite the book of Gerstner & Kistler on spiking neurons, and more specifically the spike-based approach for learning to represent signals (Brendel, .., Machens, Denève, PLoS CB, 2020). Otherwise, we had difficulties to incorporate the other literature that seems to us not directly related to our approach, even when related notions come up (like predictive coding and temporal processing in Chalk et al. (2018), where various temporal coding schemes coding efficiency is studied as a function of the signal-to-noise ratio), or the apical activities in Greedy et al. (2022), where bursting, multiplexing and synaptic facilitation arises). We found it would confuse more than it would help if we would cite these papers too (we do already cite 95 papers).

      (7) In the main text, theorem two is presented as proof without assumptions on the level of nudging, but the actual proof uses strong assumptions in that respect, relying on numerical ad hoc observations for the general case.

      Thanks for pointing this out. We agree it is a better style to state all the critical assumptions in Theorem itself, rather than deferring them to the Methods. We now state: “Then, for suitable top-down nudging, learning rates, and initial conditions, the ….weights …evolve such that…”.

      (8) In the discussion regarding error-backpropagation, it seems to me that it could be clarified that the current algorithm asks for a weight alignment between FF and FB matrices as well as between FB and interneuron circuit matrices. Whether all of these matrices can be learned together remains to be shown; neither Akrout, Kunin nor Max et al. have shown this explicitly. Particularly when there are other inputs to the apical dendrites from other areas.

      Yes, it is difficult to learn to align all in parallel. Nevertheless, our simulations in fact do align the lateral and vertical circuits, at is also claimed in Theorem 2. Yet, as specified in the theorem, “for suitable learning rates” (that were all the same, but were commonly reduced after some training time, as previously explained in the Methods, Details for Fig. 5).

      In the Discussion we now emphasis that, in general, simulating all the circuitries jointly from scratch in a single phase is tricky. We write:

      “A fundamental difficulty arises when the neuronal implementation of the Euler-Lagrange equations requires an additional microcircuit with its own dynamics. This is the case for the suggested microcircuit extracting the local errors. Formally, the representation of the apical feedback errors first needs to be learned before the errors can teach the feedforward synapses on the basal dendrites. We showed that this error learning can itself be formulated as minimizing an apical mismatch energy. What the lateral feedback through interneurons cannot explain away from the top-down feedback remains as apical prediction error.

      Ideally, while the network synapses targetting the basal tree are performing gradient descent on the global cost, the microcircuit synapses involved in the lateral feedback are performing gradient descent on local error functions, both at any moment in time.

      The simulations show that this intertwined system can in fact learn simultaneously with a common learning rate that is properly tuned. The cortical model network of inter- and pyramidal neurons learned to classify handwritten digits on the fly, with 10 digit samples presented per second. Yet, the overall learning is more robust if the error learning in the apical dendrites operates in phases without output teaching but with corresponding sensory activity, as may arise during sleep (see e.g. Deperrois et al., 2022 and 2023).”

      (9) The short-term depression model is assuming a slow type of short-term depression, not the fast types that are the focus of much recent experimental literature (like Campagnola et al. Science 2022).

      This assumption should be specified.

      Thanks for hinting to this literature that we were not aware of. We are now citing the releaseindependent plasticity (Campagnola et al. 2022) in the context of our synaptic depression model.

      (10) There seems to be a small notation issue: Eq 21 combines vectors of the size of the full network (bar{e}) and the size of the readout network (bar{e}star).

      Well, for notational convenience we set the target error to e*=0 for non-output neurons. This way we can write the total error for an arbitrary network neuron as the sum of the backpropagated error plus the putative target error (if the neuron is an output neuron). Otherwise we would always have to distinguish between network neuron that may be output neurons, and those that are not. We did say this in the main text, but are repeating it now again right after Eq. 21. -- Notations are often the result of a tradoff.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript presents a compelling model to explain the impact of mosaicism in preimplantation genetic testing for aneuploidies.

      Strengths:

      A new view of mosaicism is presented with a computational model, that brings new insights into an "old" debate in our field. It is a very well-written manuscript.

      Weaknesses:

      Although the manuscript is very well written, this is in a way that assumes that the reader has existing knowledge about specific terms and topics. This was apparent through a lack of definitions and minimal background/context to the aims and conclusions for some of the author's findings.

      There is a need for some examples to connect real evidence and scenarios from clinical reports with the model.

      We thank the reviewer for their assessment. Some background was condensed for space, and we wrote the manuscript to be understood by readers with existing reproductive genetics background. We will add more detail and explain terminology more clearly. There are a number of published case studies that can link real-life clinical data with the model’s findings. We will include a summary of them in the text.

      Reviewer #2 (Public Review):

      Summary:

      Although an oversimplification of the biological complexities, this modeling work does add, in a limited way, to the current knowledge on the theoretical difficulties of detecting mosaicism in human blastocysts from a single trophectoderm biopsy in PGT. However, many of the premises that the modeling was built on are theoretical and based on unproven biological and clinical assumptions that could yet lead to be untrue. Therefore, the work should be considered only as a simplified model that could assist in further understanding of the complexities of preimplantation embryo mosaicism, but assumptions of real-world application are, at this stage, premature and should not be considered as evidence in favour of any clinical strategies.

      Strengths:

      The work has presented an intriguing theoretical model for elaborating on the interpretation of complex and still unclear biological phenomena such as chromosomal mosaicism in preimplantation embryos.

      We thank the reviewer for this detailed review, and that they see the value of theoretical modelling. We agree that this model makes simplifications; we took this simplified approach to focus on the core contradiction between clinical experience and previous modelling. Expanding the model to consider additional aspects of balanced mitotic nondisjunctions and technical accuracy is something we want to address; we are discussing whether this is something that can be practically added to this manuscript, or will involve enough work that should be developed as a further study.

      Weaknesses:

      Lines 134-138: The spatial modeling of mitotic errors in the embryo was oversimplified in this manuscript. There is only limited (and non-comprehensive) evidence that meiotic errors leading to chromosome mosaicism arise from chromosome loss or gain only (e.g. anaphase lag). This work did not take into account the (more recognised) possibility of mitotic nondisjunction where following the event there would be clones of cells with either one more or one less of the same chromosome. Although addressed in the discussion (lines 572-574), not including this in the most basic of modeling is a significant oversight that, based on the simple likelihood, could significantly affect results.

      As above, we certainly plan to address this in future modelling; developing the model to account for this while also incorporating the issue of technical uncertainty in the state of each cell in the biopsy from sequencing.

      General comment: the premise of the manuscript is that an embryologist (embryology laboratory) is aware of and can accurately quantify the number of cells in a blastocyst or TE biopsy. The reality is that it is not possible to accurately do this without the destruction of the sample which is obviously not clinically applicable. Based on many assumptions the findings show that taking small biopsies poorly classifies mosaic embryos, which is not disputed. However, extrapolating this to the clinic and making suggestions to biopsy a certain amount of cells (lines 539-540) is careless and potentially harmful by suggesting the introduction of potential change in clinical practice without validation. Additionally, no embryologist in the field can tell how many cells are present in a clinical TE biopsy, making this suggestion even more impractical.

      We will revise this to make the technical limitations of clinical TE biopsies clearer.

      On a more general clinical consideration, the authors should acknowledge that when reporting findings of unproven clinical utility and unknown predictive values this inevitably results in negative consequences for infertile couples undergoing IVF. It is proven and established that when couples face the decision on how to manage a putative mosaicism finding, the vast majority decide on embryo disposal. It was recently reported in an ESHRE survey that about 75% of practitioners in the field consider discarding or donating to research embryos with reported mosaicism. A prospective clinical trial showed that about 30% live birth rate reduction can be expected if mosaic embryos are not considered (Capalbo et al., AJHG 2021). The real-world experience is that when mosaicism is reported, embryos with almost normal reproductive potential are discarded. The authors should be more careful with the clinical interpretation and translation of these theoretical findings.

      The clinical potential of mosaic embryos is much more nuanced than a simple ‘they should be discarded’ or ‘they should be treated like euploid embryos’. While the study mentioned by the reviewer (Capalbo et al., AJHG 2021) does indeed suggest that embryos with putative low level mosaicism have good potential, it also suggests that embryos with putative high level mosaicism are largely to be considered aneuploid and should therefore be discarded. Therefore, even the mentioned study supports a ‘ranking’ of embryos by their mosaic result. Furthermore, large controlled retrospective studies have indicated that even high level mosaic embryos have reproductive potential (Viotti Fertility & Sterility 2021 and Viotti F&S 2023). Recent case reports have shown that mosaicism can occasionally persist from embryo to late gestation and even birth, at times associating with negative medical findings. Therefore, while the true clinical potential of embryos classified as mosaic is still being defined, here we are merely suggesting that from a modelling standpoint, the features of mosaicism detected with PGT-A can help guide clinical decisions (complementing the observations reported in the clinical studies).

      There is a robust consensus within the field of clinical genetics and genomics regarding the necessity to exclusively report findings that possess well-established clinical validity and utility. This consensus is grounded in the imperative to mitigate misinterpretation and ineffective actions in patient care. However, the clinical framework delineated in this manuscript diverges from the prevailing consensus in clinical genetics. Clinical genetics and genomics prioritize the dissemination of findings that have undergone rigorous validation processes and have demonstrated clear clinical relevance and utility. This emphasis is crucial for ensuring accurate diagnosis, prognosis, and therapeutic decision-making in patient care. By adhering to established standards of evidence and clinical utility, healthcare providers can minimize the potential for misinterpretation and inappropriate interventions. The framework proposed in this manuscript appears to deviate from the established principles guiding clinical genetics practice. It is imperative for clinical frameworks to align closely with the consensus guidelines and recommendations set forth by professional organizations and regulatory bodies in the field. This alignment not only upholds the integrity and reliability of genetic testing and interpretation but also safeguards patient well-being and clinical outcomes.

      References:

      ACMG Board of Directors. (2015). Clinical utility of genetic and genomic services: a position statement of the American College of Medical Genetics and Genomics. Genetics in Medicine, 17(6), 505-507. https://doi.org/10.1038/gim.2014.194.

      Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., ... ACMG Laboratory Quality Assurance Committee. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine, 17(5), 405-424. https://doi.org/10.1038/gim.2015.30

      We will update where necessary to match these references.

      Line 61: "Self correction" - This terminology is unfortunately indiscriminately used in the field for PGT when referring to mosaicism and implies that the embryo can actively correct itself from a state of inherent abnormality. Apart from there being no evidence to suggest that there is an active process by which the embryo itself can correct chromosomal errors, most presumed euploid/aneuploid mosaic embryos will have been euploid zygotes and therefore "self-harm" may be a better explanation. True self-correction in the form of meiotic trisomy/monosomy rescue is of course theoretically possible but not at all clinically significant. The concept being conveyed in this part of the manuscript is not disputed but it is strongly suggested that the term "self correction" is not used in this context, nor in the rest of the manuscript, to prevent the perpetuation of misinformation in the field and instead use a better description.

      This is a good point. We have used ‘self correction’ as a shorthand, but the reality is more nuanced. It will often be a passive process in which aneuploid cell lineages fail to proliferate over time (‘aneuploidy depletion’). The idea of ‘self harm’ is interesting; aneuploidy arising from a healthy euploid embryo. We can also see a further situation where the gametes suffered damage (e.g. DNA fragmentation, unresolved crossovers, persistence of meiotic breaks) leading to mitotic errors. In that case, the embryo would suffer the consequences of harm in the gametes, and ‘aneuploidy rescue’ may be a useful term also. We will discuss this further and reword the terminology along these lines.

      Lines 69-73: The ability to quantify aneuploidy in known admixtures of aneuploid cells is indeed well established. However, the authors claim that the translation of this to embryo biopsy samples is inferred with some confidence and that if a biopsy shows an intermediate chromosome copy number (ICN), that the biopsy and the embryo are mosaic. There are no references provided here and indeed the only evidence in the literature relating to this is to the contrary. Multifocal biopsy studies have shown that an ICN result in a single biopsy is often not seen in other biopsies from the same embryo (Capalbo et al 2021; Kim et al., 2022; Girardi et al., 2023; Marin, Xu, and Treff 2021). Multifocal biopsies showing reciprocal gain and loss which would provide stronger validation for the presence of true mosaicism are also rare. In this work, the entire manuscript is based on the accuracy of ICN in a biopsy being reflective of mosaicism in the embryo. The evidence however points to a large proportion of ICN detected in embryo biopsy potentially being technical artifacts (misdiagnosing both constitutionally normal and abnormal (meiotic aneuploid) embryos as mosaic. Therefore, although results from the modelling provide insight into theoretical results, these can not be used to inform clinical decision-making at all.

      We thank the reviewer for raising this important conceptual point, which needs to be addressed. The fact that mosaicism is often not observed in serial biopsies of the same embryo is precisely an inherent feature of mosaicism and is an invalid argument to discount the original diagnosis as false. The detection of ICN is not trivial and certain PGT-A platforms might not have the capability to discern noise from true ICN, hence the need for proper validation of the technology. The most stringent validation method for mosaicism detection remains the admixture experiment, such that when ICN patterns are detected the most obvious conclusion is that the biopsy contained a mosaic mix of cells. We aim to add wording regarding these points in the manuscript.

      Lines 87-89: The authors make the claim that emerging evidence is suggestive that the majority of embryos are mosaic to some degree. If in fact, mosaicism is the norm, the clinical importance may be limited.

      If the majority of embryos are mosaic to some degree, it is important to understand the impacts that this may have on PGT-A biopsies and how informative such biopsies may be. Returning to the point the reviewer made above about mitotic aneuploidies as an important consideration: a mitotic nondisjunction at the first cleavage would result in a embryo that was entirely aneuploid. A mitotic nondisjunction occurring at the second cleavage would result in an embryo with 50% aneuploid cells, at the third cleavage, 25% aneuploid cells. If these aneuploid cells fail to proliferate, or are removed (either actively or passively), the level of aneuploidy will fall over time. While mosaicism is a binary (an embryo is or is not a mosaic of karyotypes), even if most embryos are mosaic, the clinical importance will depend on the level of aneuploidy.

      Line 102-103: The statement that data shows that the live birth rate per ET is generally lower in mosaic embryos than euploid embryos is from retrospective cohort studies that suffer from significant selection bias. The authors have ignored non-selection study results (Capalbo et al, ajhg 2021) that suggest that putative mosaicism has limited predictive value when assessed prospectively and blinded.

      We will add the referenced multifocal biopsy study, but in contrast to the reviewer we see the data it contains as supporting our position in this paper. Capalbo et al. performed rebiopsies of trophectoderm and a biopsy of inner cell mass and found that high level mosaic or aneuploid trophectoderm tended to correlate with abnormal karyotypes in the inner cell mass while low level mosaics correlated with a normal inner cell mass. This supports our point that measuring levels of aneuploidy in the trophectoderm is relevant, and that this gives useful information for ranking embryos.

      Lines 94-98: The authors have misrepresented the works they have presented as evidence for biopsy result accuracy (Kim et al., 2023; Victor et al 2019; Capalbo et al., 2021; Girardi et al., 2023, and any others). These studies show that a mosaic biopsy is not representative of the whole embryo and can actually be from embryos where the remainder of the embryo shows no evidence of mosaicism. There is also a missing key reference of Capalbo et al, AJHG 2021, and Girardi et al., HR 2023 where multifocal biopsies were taken.

      As above, we will add more information on these multifocal biopsy studies; we believe these studies also support our position: that individual biopsies are not predictive of aneuploidy level in an embryo. If mosaicism is detected in the biopsy, then the embryo is mosaic, but if the remainder of the embryo is euploid then that single biopsy was not an accurate representation of the embryo. This could also apply in reverse - if mosaicism is not detected in the biopsy, it does not mean there is no mosaicism in the embryo, only that mosaicism could not be identified.

      Lines 371-372: "Selecting the embryo with the lowest number of aneuploid cells in the biopsy for transfer is still the most sensible decision". Where is the evidence for this other than the modeling which is affected by oversimplification and unproven assumptions? Although the statement seems logical at face value, there is no concrete evidence that the proportion of aneuploid cells within a biopsy is valuable for clinical outcomes, especially when co-evaluated with other more relevant clinical information.

      We made this statement as part of a thought experiment to explain the difference between the concepts of absolute measurements versus embryo ranking. This section is not a result of the model, or clinical advice; it is a statement that in the specific example embryos given, the embryo with the fewest aneuploid cells in the biopsy would still be the embryo with the fewest aneuploid cells overall, and thus transferring this embryo (in the absence of any other differences of embryo quality) would remain sensible.

      Lines 431-463: In this section, the authors discuss clinical outcome data from the transfer of putative mosaic embryos and make conclusions about the relationship between ICN level in biopsy and successful pregnancy outcomes. The retrospective and selective nature of the data used in forming the results has the potential to lead to incorrect conclusions when applied to prospective unselected data.

      We believe the clinical data is a useful biological reality check, and we are discussing how to integrate it better with the modelling.

      Reviewer #3 (Public Review):

      Unfortunately, this study fails to incorporate the most important variable impacting the ability to predict mosaicism, the accuracy of the test. The fact is that most embryos diagnosed as mosaic are not mosaic. There may be 4 cases out of thousands and thousands of transfers where a confirmation was made. Mosaicism has become a category of diagnosis in which embryos with noisy NGS profiles are placed. With VeriSeq NGS it is not possible to routinely distinguish true mosaicism from noise. An analysis of NGS noise levels (MAPD) versus the rate of mosaics by clinic using the registry will likely demonstrate this is the case. Without accounting for the considerable inaccuracy of the method of testing the proposed modeling is meaningless.

      We disagree with the reviewer that the modelling is meaningless; we disagree that mosaicism is rare (see our other points). However, if we grant that mosaicism is rare, that almost all embryos are euploid or aneuploid, and that technical noise is the primary factor generating intermediate copy number values, then it is still important to understand how to interpret such intermediate values. Low-level mosaics would more likely represent miscalled euploid embryos, and high-level mosaics would more likely represent miscalled aneuploid embryos. We demonstrate that ranking on these intermediate values correlates with implantation rates and live birth rates, supporting their use. We do agree that technical accuracy of the NGS is an important consideration, and we will be incorporating this into our modelling in the future.

      Recent data using more accurate methods of identifying mosaicism indicate that the prevalence of true preimplantation embryonic mosaicism is only 2%, which is also consistent with findings made post-implantation. This model fails to account for the possibility that, because so few embryos are actually mosaic, there is actually no relevance to clinical care whatsoever. In fact, differences in clinical outcomes of embryos designated as mosaic could be entirely attributed to poor embryo quality resulting in noise levels that make NGS results fall into the "mosaic" category.

      As we also wrote in the point above, we disagree; it is possible that a euploid embryo may be misinterpreted as a mosaic. It is also possible that an aneuploid embryo is misinterpreted as a mosaic. Whether the intermediate copy number values arise through biological or technical reasons, they contain information that is useful to decisions on whether to transfer. We also note a recent paper that performed single-cell dissociation of trophectoderm versus inner cell mass which found that mosaicism in human embryos is very common (Chavli et al, 2024, DOI:10.1172/JCI174483).

      Additional comments:

      “Indeed, as more data emerges, it appears that the majority of embryos from both healthy and infertile couples are mosaic to some degree (Coticchio et al., 2021; Griffin et al., 2022).”

      This statement should be softened as all embryos will be considered mosaic when a method with a 10% false positive rate is applied to 10 more parts of the same embryo. The distinction between artifact and true mosaicism cannot be made with nearly all current methods of testing. When virtually no embryos display uniform aneuploidy in a rebiopsy study, there should be great concern over the accuracy of the testing used. The vast majority of aneuploidy is meiotic in origin.

      We note that reviewer 2 wrote that mitotic aneuploidy was the key concern, whereas reviewer 3 states meiotic aneuploidy is more common; we argue that both are relevant; a recent study by McCoy et al, 2023 (DOI:10.1186/s13073-023-01231-1) found that both drive arrest of human IVF embryos.

      “Experimental data provides strong evidence that, for the most part, the biopsy result obtained accurately represents the chromosome constitution of the rest of the embryo (Kim 96 et al., 2022; Navratil et al., 2020; Victor et al., 2019).”

      This statement is incorrect given published systematic review of the literature indicates a 10% false positive rate based on rebiopsy results.

      This shows that accurately classifying a mosaic embryo based on a single biopsy is not robust.

      This is exactly why the practice of designating embryo mosaics with intermediate copy numbers should not exist.

      We agree that accurately classifying a mosaic embryo based on a single biopsy is not robust. That is one of the main messages of this paper. What we show here is that biopsies from a mosaic embryo are indeed likely to disagree with each other - but we find that there is still enough information at a population level for this to be an indicator or embryo outcomes. We have not yet performed modelling to explore the effect of technical error, so we will not speculate on the impact, but we reiterate a point made earlier: the most stringent validation method for mosaicism detection remains the admixture experiment, such that when intermediate copy number patterns are detected the most obvious conclusion is that the biopsy contained a mosaic mix of cells.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The manuscript by Jingsong Zhou and colleagues tries to uncover the reasons for the resistance of extraocular muscles (EOMs) to degenerative changes induced by amyotrophic lateral sclerosis (ALS). The findings of the study offer valuable information that EOMs are spared in ALS because they produce protective factors for the NMJ and, more specifically, factors secreted by EOM-derived satellite cells. While most of the experimental approaches are convincing, the use of sodium butyrate (NaBu) in this study needs further investigation, as NaBu might have a variety of biological effects. Overall, this work may help develop future therapeutic interventions for patients with ALS.

      We agree with the editor that NaBu have a variety of biological effects that require further investigation. Our team previously have explored the effect of NaBu treatment on intestinal microbiota and intestinal epithelial permeability (DOI: 10.1016/j.clinthera.2016.12.014), on the mitochondrial respiratory function of NSC-34 motor neuron cell line overexpressing hSOD1G93A (DOI: 10.3390/biom12020333) and on the mitochondrial function of skeletal muscle myofibers of G93A mice (DOI: 10.3390/ijms22147412). Other research teams have also explored the role of NaBu (or HDAC inhibition) in neuronal survival and axonal transport (DOIs: 10.1073/pnas.0907935106; 10.1038/s41467-017-00911-y; 10.15252/embj.2020106177; 10.1093/hmg/ddt028).

      Since the theme of this manuscript is the transcriptomic characteristics of EOM SCs, to include data of how NaBu affect cellular/molecular processes of other tissues will somewhat deviate from the theme. It would be more appropriate to develop a separate manuscript focusing on other tissues.

      We appreciate the feedback from the Editors and reviewers. We realized that our previous description on butyrate’s beneficial role might be overstated in the Abstract Section. We have made two changes to avoid potential overstatement of our finding: (1) We modified the Abstract to state that “the NaBu-induced transcriptomic changes resembling the patterns of EOM SCs “may contribute to” (instead of “underlie”) the beneficial effects observed in G93A mice” (Page 1, Line 29); (2) We have edited the corresponding paragraph in the Discussion section to emphasize that the effect of NaBu treatment is multi-faceted (Page 11, Line 459-461).

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      line 388-389. The sentence has been corrected but is still not clear. What do the authors mean by ".....resulting in higher proportion of COX-deficient myofibers than other muscles». What other muscles do they refer to?

      Other muscles refer to muscles whose stem cells remain dormant under physiological conditions (uninjured, innervated), such as EDL. We have edited the sentence accordingly. (Page 10, Line 431-432)

      In reference to the results shown in Fig. 2, 7, 8 and 9. Since the experimenters were not blinded, this should be explicitly stated in the Methods section.

      We have added the disclaimer in the current “Data analysis and statistics” section in Methods as follows: “The experimenters were not blinded to the samples in data collection and analysis.” (Page 15, Line 636)

      Figure 7 C has been amended but now the inserted ANOVA values interfere with the correct visualization of Fig. 7D, can panels D be moved down so that they are better separated from panels in Fig. 7C

      Thanks for the comment and we have edited Figure 7 accordingly.

      Reviewer #4 (Recommendations For The Authors):

      The authors have revised the manuscript per the reviewer's comments in this study. While most of the concerns were addressed, a few concerns remain.

      The molecular basis of how AAV-mediated delivery of Cxcl12 improves the phenotype of satellite cells is still unclear.

      Thanks for the comment. As one of the earliest discovered chemokines, the chemotactic role of Cxcl12-Cxcr4 axis on cells and cellular processes (such as axons) has been comprehensively investigated by different functional assays from overexpression to protein application to inhibitor application to knockdown by shRNAs in different types of tissues. To list a few examples, the establishment of the correct routing trajectories of mammalian motor axons and oculomotor axons during embryonic development (DOIs: 10.1016/j.neuron.2005.08.011; 10.1167/iovs.18-25190). The regeneration of injured motor axon terminals guided by terminal Schwann cells in adult mice (DOI: 10.15252/emmm.201607257). The migration of neural crest cells to sympathetic ganglia in the formation of sympathetic nerve system during embryogenesis (DOI: 10.1523/JNEUROSCI.0892-10.2010). The migration of myoblasts in the process of fusion into myotubes (DOIs: 10.1242/jcs.066241; 10.1111/boc.201200022; 10.1074/jbc.M706730200).

      Because the existence of so many detailed mechanistic studies, our goal for this manuscript is not to identify a novel mechanism of how Cxcl12-mediated chemotaxis is achieved. Rather, we used it as one of the proof-of-concept mechanisms contributing to the resistance of EOMs against ALS and benefits of NaBu treatment. Certainly, it is not the sole mechanism.

      To address the reviewer’s concern, we have expanded discussion about the previous studies regarding the chemotactic effect of Cxcl12 in the discussion section. (Page 10, Line 435-436, Page 11, Line 445-446)

      The NaBu experiments may need additional support from other approaches. NaBu effects may not be directly related to satellite cells or muscle cells. Thus, the animal experiment results need to be carefully interpreted.

      We agree that NaBu have a variety of biological effects that require further investigation. Our team previously have explored the effect of NaBu treatment on intestinal microbiota and intestinal epithelial permeability (DOI: 10.1016/j.clinthera.2016.12.014), on the mitochondrial respiratory function of NSC-34 motor neuron cell line overexpressing hSOD1G93A (DOI: 10.3390/biom12020333) and on the mitochondrial function of skeletal muscle myofibers of G93A mice (DOI: 10.3390/ijms22147412). Other research teams have also explored the role of NaBu (or HDAC inhibition) in neuronal survival and axonal transport (DOIs: 10.1073/pnas.0907935106; 10.1038/s41467-017-00911-y; 10.15252/embj.2020106177; 10.1093/hmg/ddt028).

      Since the theme of this manuscript is the transcriptomic characteristics of EOM SCs, to include data of how NaBu affect cellular/molecular processes of other tissues will somewhat deviate from the theme. It would be more appropriate to develop a separate manuscript specifically addressing the impact of NaBu on other tissues.

      We appreciate the feedback from the reviewers. We realized that our previous description on butyrate’s beneficial role might be overstated in the Abstract Section. In response, we have made two changes to avoid potential overstatement of our finding: (1) We modified the Abstract to state that “the NaBu-induced transcriptomic changes resembling the patterns of EOM SCs “may contribute to” (instead of “underlie”) the beneficial effects observed in G93A mice” (Page 1, Line 29); (2) We edited the corresponding paragraph in the Discussion section to emphasize that the effect of NaBu treatment is multi-faceted (Page 11, Line 459-461).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ngo et al. report a peculiar effect where a single base mismatch (CC) can enhance the mechanical stability of a nucleosome. In previous studies, the same group used a similar state-of-the-art fluorescence-force assay to study the unwrapping dynamics of 601-DNA from the nucleosome and observed that force-induced unwrapping happens more slowly for DNA that is more bendable because of changes in sequence or chemical modification. This manuscript appears to be a sequel to this line of projects, where the effect of CC is tested. The authors confirmed that CC is the most flexible mismatch using the FRET-based cyclization assay and found that unwrapping becomes slower when CC is introduced at three different positions in the 601 sequence. The CC mismatch only affects the local unwrapping dynamics of the outer turn of nucleosomal DNA.

      Strengths:

      These results are in good agreement with the previously established correlation between DNA bendability and nucleosome mechanical stability by the same group. This well-executed, technically sound, and well-written experimental study contains novel nucleosome unwrapping data specific to the CC mismatch and 601 sequence, the cyclizability of DNA containing all base pair mismatches, and the unwrapping of 601-DNA from xenophus and yeast histones. Overall, this work will be received with great interest by the biophysics community and is definitely worth attention.

      Weaknesses:

      The scope and impact of this study are somewhat limited due to the lack of sequence variation. Whether the conclusion from this study can be generalized to other sequences and other bendability-enhancing mismatches needs further investigation.

      Major questions:

      (1) As pointed out by the authors, the FRET signal is not sensitive to nucleosome position; therefore, the increasing unwrapping force in the presence of CC can be interpreted as the repositioning of the nucleosome upon perturbation. It is then also possible that CC-containing DNA is not positioned exactly the same as normal DNA from the start upon nucleosome assembly, leading to different unwrapping trajectories. What is the experimental evidence that supports identical positioning of the nucleosomes before the first stretch?

      We added the following and refer to our recent publication1 to address this question.

      “This is consistent with a previous single nucleotide resolution mapping of dyad position from of a library of mismatches in all possible positions along the 601 sequence or a budding yeast native sequence which showed that a single mismatch (A-A or T-T) does not affect the nucleosome position27.”

      (2) The authors chose a constant stretching rate in this study. Can the authors provide a more detailed explanation or rationale for why this rate was chosen? At this rate, the authors found hysteresis, which indicates that stretching is faster than quasi-static. But it must have been slow and weak enough to allow for reversible unwrapping and wrapping of a CC-containing DNA stretch longer than one helical turn. Otherwise, such a strong effect of CC at a single location would not be seen. I am also curious about the biological relevance of the magnitude of the force. Can such force arise during nucleosome assembly in vivo?

      To address the comment about the magnitude of force, we added the following paragraph to Introduction. “RNA polymerase II can initiate transcription at 4 pN of hindering force2 and its elongation activity continues until it stalls at ~ 10 pN of hindering force3,4. Therefore, the transcription machinery can generate picoNewtons of force on chromatin as long as both the machinery and the chromatin segment in contact are tethered to stationary objects in the nucleus. Another class of motor protein, chromatin remodeling enzymes, was also shown to induce processive and directional sliding of single nucleosomes when the DNA is under similar amount of tension (~ 5 pN)5. Therefore, measurements of nucleosomes at a few pN of force will expand our knowledge of the physiology roles of nucleosome structure and dynamics.”

      To address the comment about the stretching rate, we added the following to Results. We note that the physiological loading rate has been challenging to determine for any biomolecular interactions, and the only quantitative measurement we are aware of is that of an integrin that we are citing.

      “The force increases nonlinearly and the loading rate, i.e. the rate at which the force increases, was approximately in the range of 0.2 pN/s to 6 pN/s, similar to the cellular loading rates for a mechanosensitive membrane receptor6.”

      (3) In this study, the CC mismatch is the only change made to the 601 sequence. For readers to truly appreciate its unique effect on unwrapping dynamics as a base pair defect, it would be nice to include the baseline effects of other minor changes to the sequence. For example, how robust is the unwrapping force or dynamics against a single-bp change (e.g., AT to GC) at the three chosen positions?

      Unfortunately, we are unable to perform the suggested unwrapping experiment in a timely manner because the instrument has been disassembled during our recent move. However, we previously performed unwrapping experiments not only as a function of sequence but also as a function of cytosine modification and showed that we can detect even more subtle effects7,8. In addition, please note that we are not claiming that simply changing basepair at the chosen sites changes the mechanical stability of a nucleosome so we do not believe the requested experiment is necessary.

      (4) The last section introduces yeast histones. Based on the theme of the paper, I was expecting to see how the effect of CC is or is not preserved with a different histone source. Instead, the experiment only focuses on differences in the unwrapping dynamics. Although the data presented are important, it is not clear how they fit or support the narrative of the paper without the effect of CC.

      We apologize for giving the reviewer a wrong impression. We included the data because we believe that information on how the histone core can determine the translation of DNA mechanics into nucleosome mechanical stability will be of interest to the readers of this manuscript. We now mention explicitly that the observation was made using intact DNA, i.e. no mismatch, in the abstract and elsewhere.

      (5) It is stated that tRNA was excluded in experiments with yeast-expressed nucleosomes. What is the reason for excluding it for yeast nucleosomes? Did the authors rule out the possibility that tRNA causes the measured difference between the two nucleosome types?

      We normally include tRNA because we found that it reduces sticking of beads to the surface over several hours of experiments. In yeast nucleosomes, we found that tRNA causes the nucleosome to disassemble. Therefore, we did not include tRNA in yeast nucleosome experiments. We now mention this in Methods as reproduced below.

      “tRNA, which we normally include to reduce sticking of beads to the surface over the hours of single molecule experiments in a sealed chamber, was excluded in experiments with yeastexpressed nucleosomes because tRNA induced disassembly of nucleosomes assembled using yeast histones.”

      We cannot not formally rule out the possibility that tRNA causes the measured difference between Xenopus - vs Yeast- nucleosomes. However, we have shown in our previous publication7 that the asymmetric unwrapping in Xenopus nucleosomes was modulated by the DNA sequence. When we swapped the sequence of the inner turn between the two sides, while tRNA was included in all experiments, we observed stochastic unwrapping instead. As part of our response to another reviewer’s comments, we also added the following on the relevant differences between the species in Discussion.

      “The crystal structure of the yeast nucleosome suggests that yeast nucleosome architecture is subtly destabilized in comparison with nucleosomes from higher eukaryotes9. Yeast histone protein sequences are not well conserved relative to vertebrate histones (H2A, 77%; H2B, 73%; H3, 90%; H4, 92% identities), and this divergence likely contributes to differences in nucleosome stability. Substitution of three residues in yeast H3 a3-helix (Q120, K121, K125) very near the nucleosome dyad with corresponding human H3.1/H3.3 residues (QK…K replaced with MP…Q) caused severe growth defects, elevated nuclease sensitivity, reduced nucleosome positioning and nucleosome relocation to preferred locations predicted by DNA sequence alone 10. The yeast histone octamer harboring wild type H3 may be less capable of wrapping DNA over the histone core, leading to reduced resistance to the unwrapping force for the more flexible half of the 601positioning sequence.”

      Reviewer #2 (Public Review):

      Summary:

      Mismatches occur as a result of DNA polymerase errors, chemical modification of nucleotides, during homologous recombination between near-identical partners, as well as during gene editing on chromosomal DNA. Under some circumstances, such mismatches may be incorporated into nucleosomes but their impact on nucleosome structure and stability is not known. The authors use the well-defined 601 nucleosome positioning sequence to assemble nucleosomes with histones on perfectly matched dsDNA as well as on ds DNA with defined mismatches at three nucleosomal positions. They use the R18, R39, and R56 positions situated in the middle of the outer turn, at the junction between the outer turn and inner turn, and in the middle of the inner turn, respectively. Most experiments are carried out with CC mismatches and Xenopus histones. Unwrapping of the outer DNA turn is monitored by singlemolecule FRET in which the Cy3 donor is incorporated on the 68th nucleotide from the 5'-end of the top strand and the Cy5 acceptor is attached to the 7th nucleotide from the 5' end of the bottom strand. Force is applied to the nucleosomal DNA as FRET is monitored to assess nucleosome unwrapping. The results show that a CC mismatch enhances nucleosome mechanical stability. Interestingly, yeast and Xenopus histones show different behaviors in this assay. The authors use FRET to measure the cyclization of the dsDNA substrates to test the hypothesis that mismatches enhance the flexibility of the 601 dsDNA fragment and find that CC, CA, CT, TT, and AA mismatches decrease looping time, whereas GA, GG, and GT mismatches had little to no effect. These effects correlate with the results from DNA buckling assays reported by Euler's group (NAR 41, 2013) using the same mismatches as an orthogonal way to measure DNA kinking. The authors discuss that substitution rates are higher towards the middle of the nucleosome, suggesting that mismatches/DNA damage at this position are less accessible for repair, consistent with the nucleosome stability results.

      Strengths:

      The single-molecule data show clear and consistent effects of mismatches on nucleosome stability and DNA persistence length.

      Weaknesses:

      It is unclear in the looping assay how the cyclization rate relates to the reporting looping time. The biological significance and implications such as the effect on mismatch repair or nucleosome remodelers remain untested. It is unclear whether the mutational pattern reflects the behavior of the different mismatches. Such a correlation could strengthen the argument that the observed effects are relevant for mutagenesis.

      Reviewer #3 (Public Review):

      Summary:

      The mechanical properties of DNA wrapped in nucleosomes affect the stability of nucleosomes and may play a role in the regulation of DNA accessibility in eukaryotes. In this manuscript, Ngo and coworkers study how the stability of a nucleosome is affected by the introduction of a CC mismatched base pair, which has been reported to increase the flexibility of DNA. Previously, the group has used a sophisticated combination of single-molecule FRET and force spectroscopy with an optical trap to show that the more flexible half of a 601 DNA segment provides for more stable wrapping as compared to the other half. Here, it is confirmed with a single-molecule cyclization essay that the introduction of a CC mismatch increases the flexibility of a DNA fragment. Consistent with the previous interpretation, it also increased the unwrapping force for the half of the 601 segment in which the CC mismatch was introduced, as measured with single-molecule FRET and force spectroscopy. Enhanced stability was found up to 56 bp into the nucleosome. The intricate role of mechanical stability of nucleosomes was further investigated by comparing force-induced unwrapping profiles of yeast and Xenopus histones. Intriguingly, asymmetric unwrapping was more pronounced for yeast histones.

      Strengths:

      (1) High-quality single-molecule data.

      (2) Novel mechanism, potentially explaining the increased prominence of mutations near the dyads of nucleosomes.

      (3) A clear mechanistic explanation of how mismatches affect nucleosome stability.

      Weaknesses:

      (1) Disconnect between mismatches in nucleosomes and measurements comparing Xenopus and yeast nucleosome stability.

      (2) Convoluted data in cyclization experiments concerning the phasing of mismatches and biotin site. ---

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:

      In Figure 1 legend, "the black diamonds on the DNA bends represent the mismatch position with R18 and R39 on minor grooves and R56 on a major groove." Minor and major grooves should be phrased as histone-facing minor and major grooves.

      We fixed the problem.

      In Materials and Methods, the sentence that describes the stretching rate cites reference 1, which does not seem to be relevant.

      We fixed the problem.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, the authors should also discuss the context of mismatches occurring during homologous recombination in meiosis or somatic cells in non-allelic recombination between near identical repeats.

      Introduction now has the following.

      “DNA base-base mismatches are generated by nucleotide misincorporation during DNA synthesis, meiotic recombination, somatic recombination between nearly identical repeats, or chemical modification such as hydrolytic deamination of cytosine.”

      (2) Generally, it seems counter-intuitive in terms of biology that mismatches containing nucleosomes are more stable, as mismatches require repair and/or detection for heteroduplex rejection during recombination. Some discussion of this apparent paradox should be added.

      To address this comment, we added the following to Discussion.

      “The higher frequency of substitutions in the nucleosomal DNA may be attributed to the difficulty of accessing the extra-stable nucleosomes. We also note that even without an enhanced stability, a mismatch within a nucleosome would be more difficult to detect for mismatch repair machineries compared to a mismatch in a non-nucleosomal DNA. Because mismatch repair machineries accompany the replisome, most of nascent mismatches may be detected for repair before nucleosome deposition. Therefore, the decrease in accessibility predicted based on our data here may be important only in rare cases a mismatch is not detected prior to the deposition of a nucleosome on the nascent DNA or in cases where a mismatch is generated via a non-replicative mechanism.”

      (3) The authors discuss that the substitution rate is higher while the indel (insertion and deletion) rate is lower nearer the center of a positioned nucleosome. Are the differences between individual mismatches reported in Figure 6 reflected in the mutagenic profile?

      We cannot currently compare them because the mutagenic profile even when it is available is a complex convolution of mismatch generation, mismatch repair and selection. Mismatch generation occurs through several different processes and how they are affected by nucleosomes and their mismatch type and sequence context is unknown. Mismatch repair process itself depends on mismatch type and sequence context as recently shown by a high throughput in vivo study11. And because the population genetics does not simply reflect de novo mutation profiles due to selection, comparison between mismatch-induced DNA mechanical changes and mutagenic profiles is further complicated. We added the following to the revision.

      “If and how the mismatch type-dependent DNA mechanics affects the sequence-dependent mismatch repair efficiency in vivo, as recently determined in a high through study in E. coli11, remains to be investigated. Comparison of mismatch-type dependent DNA mechanics to population genetics data is challenging because mutation profiles reflect a combined outcome of mismatch-generation, mismatch repair and selection in addition to other mutational processes.”

      (4) The looping assay should be explained better, especially how the cyclization rate is related to the reported looping time.

      We modified Figure 5 to include examples of looping time determination through fitting of the looped fraction vs time, and added the following to the figure caption.

      “To calculate the looping time, the fraction of looped molecules (high FRET) as a function of time is fitted to an exponential function, 𝑒−𝑡⁄(𝑙𝑜𝑜𝑝𝑖𝑛𝑔 𝑡𝑖𝑚𝑒) (right panel for one run of experiments).

      Furthermore, we added the following sentence to Results.

      “The rate of loop formation, which is the inverse of looping time determined from an exponential fitting of loop fraction vs time, was used as a measure of apparent DNA flexibility influenced by a mismatch 12,13.”

      *Reviewer #3 (Recommendations For The Authors):

      I have some concerns that, when addressed upon revision, would improve the manuscript:

      (1) Page 6 and Supplementary Figure S1C: Though the FRET levels are the same for all nucleosomes, the distribution between the two levels is not. The nucleosomes with CC mismatches appear to have a larger fraction in the low-FRET population. This seems to contradict the higher mechanical stability. A comment on this should clarify it, or make this conundrum explicit.

      Thank you for the comment. The low FRET population also includes the nucleosomes that do not have an active acceptor the fraction of which varies between preparations. We now note this in the supplementary figure caption.

      (2) It is intriguing that a more stable nucleosome forms after several pulling cycles and it is argued that this might be due to shifting of the nucleosome. This seems reasonable and has important consequences both for the interpretation of the current experimental data and for the general mechanisms involved in nucleosome maintenance and remodeling. It is puzzling though how this would work mechanistically since it only seems to happen when nucleosomes are half-wrapped and when the unwrapped half contains the mismatch. From the previous work of the group and the current manuscript, it seems that shift does not occur in DNA without mismatches (Correct?). Does shifting happen for the 601-R18 and 601-R56 nucleosomes as well?

      The mismatch-containing half is the half that is mechanically less stable in an intact, mismatch-free 601 nucleosome. So indeed, that is the half that is unwrapped in an intact nucleosome. But because the introduction of mismatch makes that half more mechanically stable, it can stay wrapped until higher forces, and the resulting structural distortion may cause the shift although we acknowledge that this interpretation remains speculative. Shifting occurs for all three constructs with a mismatch but not for the intact nucleosome without a mismatch.

      (3) Could the shifting be related to the differences in sub-population distribution observed in Supplementary Figure S1C?

      /See our response to comment (1) above.

      (4) The paper would have more impact if the mechanism of possible shifting could be clarified. This can be done experimentally with a fluorescent histone, as suggested in the manuscript. But having a FRET pair on positions in the DNA that would shift to closer proximity upon shifting, either at the ED2 or at the ED1 site will also work, is in line with the current experiments and seems feasible.

      We revised the text as follows in order not to exclude labeling configurations with both fluorophores on the DNA while reporting on the shift. We are also happy to add an appropriate reference if the reviewer can help us identify an existing study that measured dyad position shifts through such a labeling configuration.

      “However, since the FRET values in our DNA construct are not sensitive to the nucleosome position, further experiments with fluorophores conjugated to strategic positions that allow discrimination between different dyad positions14 will be required to test this hypothesis.”

      (5) Figures 5 and 6: To appreciate the quality of the data, state the number of molecules that contributed to the cyclization essay, or better, share a figure of the number of looped molecules as a function of time as supplementary data.

      We added the requested figures to Figure 5 and a new supplementary Figure 2, and added the following to Methods.

      “Approximately 2500 – 3500 molecules were quantified at each timestamp during the experiment, and three independent experiments were performed for each sequence (Supplemental Figure S2).”

      (6) Page 8/9: A control is added to confirm that the phasing of the biotin relative to the end affects the observed cyclization rate. However, the mismatch sites were chosen such that they included 5 bp phase shifts. This convolutes the outcomes, as the direction of flexibility due to the phasing of the mismatch relative to the biotin may also influence the rate. Was this checked?

      We would like to clarify that the phasing of the biotin is not so much as with respect to the end, as it is with respect to the full molecule. Static curvature and poloidal angle associated with the DNA molecule (which is something that is ultimately determined by the full chemical composition of the molecule, including its sequence and the mismatch) could make the molecule prefer a looped configuration where the biotin points towards the “inside” of the molecule. Such a configuration would be sterically unfavoured during the single molecule looping reaction where the biotin is attached to a surface via avidin. However, if the biotin is moved by half the helical repeat (or an off multiple of half the helical repeat, essentially 16 nt as done in the manuscript), it would now point to the “outside” of the molecule. Therefore, to make sure that the difference between the looping rates of any two DNA constructs (say the 601-RH and 601-R18-RH) is a better reflection of differences in dynamic flexibility, we ensure that the difference persists even when the biotin is moved by an odd multiple of half the helical repeat. We revised the section as follows.

      “For example, moving the location of the biotin tether by half the helical repeat (~ 5 bp) can lead to a large change in cyclization rate15, likely due to the preferred poloidal angle of a given DNA16 that determines whether the biotin is facing towards the inside of the circularized DNA, thereby hindering cyclization due to steric hindrance caused by surface tethering.”

      (7) Page 9/10: The comparison of yeast vs Xenopus is interesting, albeit a bit disconnected. Since the single-molecule statistics are relatively small, did the nucleosomes show similar bulk FRET distributions, or did they also show a shift in FRET levels?

      We included the data because we believe that information on how the histone core can determine the translation of DNA mechanics into nucleosome mechanical stability will be of interest to the readers of this manuscript. The FRET values were similarly distributed.

      (8) The discussion calls for a more detailed analysis of the structural differences of the histones of the two species to rationalize the observed asymmetry in flexibility dependence: why would yeast nucleosomes be less sensitive to sequence asymmetries?

      We added the following to Discussion to address this comment.

      “The crystal structure of the yeast nucleosome suggests that yeast nucleosome architecture is subtly destabilized in comparison with nucleosomes from higher eukaryotes9. Yeast histone protein sequences are not well conserved relative to vertebrate histones (H2A, 77%; H2B, 73%; H3, 90%; H4, 92% identities), and this divergence likely contributes to differences in nucleosome stability. Substitution of three residues in yeast H3 3-helix (Q120, K121, K125) very near the nucleosome dyad with corresponding human H3.1/H3.3 residues (QK…K replaced with MP…Q) caused severe growth defects, elevated nuclease sensitivity, reduced nucleosome positioning and nucleosome relocation to preferred locations predicted by DNA sequence alone 10. The yeast histone octamer harboring wild type H3 may be less capable of wrapping DNA over the histone core, leading to reduced resistance to the unwrapping force for the more flexible half of the 601positioning sequence.”

      (9) It would also be interesting if the increased stability due to the introduction of mismatches observed on Xenopus nucleosomes holds in yeast. Or does the reduced stability remove this effect? This is relevant to substantiate the broad claims in the context of evolution and cancer that are discussed in the manuscript.

      Unfortunately, we are unable to perform the suggested unwrapping experiment in a timely manner because the instrument has been disassembled during our recent move. However, in terms of cancer relevance, our mismatch dependence experiments were performed using vertebrate nucleosomes (Xenopus) so repeating this for yeast nucleosomes would not provide relevant information.

      Minor comments:

      (1) Supplementary Figure S1 misses the label '(C)' in its caption.

      We fixed it.

      (2) The supplementary data sequences for the fleezer measurements contain entrees 'R39 construct' and miss the positions of the Cy3 and Cy labels; the color code (levels of grey) is not explained.

      We fixed the labeling mistake and added detailed annotations of the highlighted features.

      References

      (1) Park, S., Brandani, G.B., Ha, T. & Bowman, G.D. Bi-directional nucleosome sliding by the Chd1 chromatin remodeler integrates intrinsic sequence-dependent and ATP-dependent nucleosome positioning. Nucleic Acids Res 51, 10326-10343 (2023).

      (2) Fazal, F.M., Meng, C.A., Murakami, K., Kornberg, R.D. & Block, S.M. Real-time observation of the initiation of RNA polymerase II transcription. Nature 525, 274-7 (2015).

      (3) Galburt, E.A., Grill, S.W., Wiedmann, A., Lubkowska, L., Choy, J., Nogales, E., Kashlev, M. & Bustamante, C. Backtracking determines the force sensitivity of RNAP II in a factor-dependent manner. Nature 446, 820-3 (2007).

      (4) Schweikhard, V., Meng, C., Murakami, K., Kaplan, C.D., Kornberg, R.D. & Block, S.M. Transcription factors TFIIF and TFIIS promote transcript elongation by RNA polymerase II by synergistic and independent mechanisms. Proc Natl Acad Sci U S A 111, 6642-7 (2014).

      (5) Kim, J.M., Carcamo, C.C., Jazani, S., Xie, Z., Feng, X.A., Yamadi, M., Poyton, M., Holland, K.L., Grimm, J.B., Lavis, L.D., Ha, T. & Wu, C. Dynamic 1D Search and Processive Nucleosome Translocations by RSC and ISW2 Chromatin Remodelers. bioRxiv (2024). (6) Jo, M.H., Meneses, P., Yang, O., Carcamo, C.C., Pangeni, S. & Ha, T. Determination of singlemolecule loading rate during mechanotransduction in cell adhesion. Science (in press).

      (7) Ngo, T.T., Zhang, Q., Zhou, R., Yodh, J.G. & Ha, T. Asymmetric unwrapping of nucleosomes under tension directed by DNA local flexibility. Cell 160, 1135-44 (2015).

      (8) Ngo, T.T., Yoo, J., Dai, Q., Zhang, Q., He, C., Aksimentiev, A. & Ha, T. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun 7, 10813 (2016).

      (9) White, C.L., Suto, R.K. & Luger, K. Structure of the yeast nucleosome core particle reveals fundamental changes in internucleosome interactions. EMBO J 20, 5207-18 (2001).

      (10) McBurney, K.L., Leung, A., Choi, J.K., Martin, B.J., Irwin, N.A., Bartke, T., Nelson, C.J. & Howe, L.J. Divergent Residues Within Histone H3 Dictate a Unique Chromatin Structure in Saccharomyces cerevisiae. Genetics 202, 341-9 (2016).

      (11) Kayikcioglu, T., Zarb, J.S., Lin, C.-T., Mohapatra, S., London, J.A., Hansen, K.D., Rishel, R. & Ha, T. Massively parallel single molecule tracking of sequence-dependent DNA mismatch repair in vivo. bioRxiv, 2023.01.08.523062 (2023).

      (12) Jeong, J., Le, T.T. & Kim, H.D. Single-molecule fluorescence studies on DNA looping. Methods 105, 34-43 (2016).

      (13) Jeong, J. & Kim, H.D. Base-Pair Mismatch Can Destabilize Small DNA Loops through Cooperative Kinking. Phys Rev Lett 122, 218101 (2019).

      (14) Blosser, T.R., Yang, J.G., Stone, M.D., Narlikar, G.J. & Zhuang, X. Dynamics of nucleosome remodelling by individual ACF complexes. Nature 462, 1022-7 (2009).

      (15) Basu, A., Bobrovnikov, D.G., Qureshi, Z., Kayikcioglu, T., Ngo, T.T.M., Ranjan, A., Eustermann, S., Cieza, B., Morgan, M.T., Hejna, M., Rube, H.T., Hopfner, K.P., Wolberger, C., Song, J.S. & Ha, T. Measuring DNA mechanics on the genome scale. Nature 589, 462-467 (2021).

      (16) Yoo, J., Park, S., Maffeo, C., Ha, T. & Aksimentiev, A. DNA sequence and methylation prescribe the inside-out conformational dynamics and bending energetics of DNA minicircles. Nucleic Acids Res 49, 11459-11475 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Beyond my general review, some descriptions of the results and methods could be further clarified, which I've outlined below:

      (1) Page 3, Line 118-120: Based on results from Fig 1A, the authors reported 15 nanobodies neutralized both delta and BA.1 out of the 41 tested. However, I only counted 14. Could the authors double check?

      We recounted the nanobodies and confirmed there are 15 as follows:

      (1) RBD-15

      (2) RBD-22

      (3) RBD-24

      (4) RBD-9S1-4

      (5) S1-35

      (6) RBD-6

      (7) RBD-5

      (8) RBD-21

      (9) RBD-16

      (10) S1-46

      (11) S1-49dimer

      (12) S2-10dimer

      (13) S2-3

      (14) S2-62

      (2) Page 5, Lines 134-135: the authors described that the heatmap reflects the neutralizing strength of the representative nanobodies from each group. For groups where multiple nanobodies were selected for visualization, how was the neutralization strength calculated? Was the IC50 averaged first before being converted into the neutralization strength?

      This has been made clear in the legend for Fig. 1 as follows “For groups with multiple nanobodies, the average -log10 (IC50) is first calculated for the nanobodies within that group, then normalized to a neutralization score within the 0–100 range using the min and max average -log10 (IC50) for that group. A higher score indicates more potent neutralization of the variant relative to the wild type.”

      (3) Page 5, Lines 138-139: What was the authors' rationale for selecting certain nanobodies over others for structural modeling and visualizing the neutralization heatmap in Fig 1B? Does it introduce bias to the neutralizing epitope map on the spike protein?

      We only focused on nanobodies for which we had enough epitope mapping data to unambiguously generate docked nanobody-spike models, as explained in our previous study (Mast et. al, eLife 2021). When multiple nanobodies within the same group had sufficient epitope mapping data available, we selected only representative candidates that had better binding affinity and/or neutralization potency. As epitope mapping via escape mutants relied largely on random point mutagenesis of Spike, there should be little introduced bias.

      Overall, groups I-VII cover an exhaustive set of target areas on the RBD (including the lone glycan site in Group-II), while groups VII and IX are representative areas on NTD and S2. Using group-average IC50s and suitable normalization as mentioned in point 3 above further prevent potential biases due to unequal number of Nbs modeled from each group.

      We have modified the text with the following:

      “For computational epitope modeling, we selected nanobody candidates using a series of experimentally obtained structural restraints, as described in Mast, Fridy et al. 2021.”

      (4) Page 5, Lines 161-167: It would be good to include Fig S1 as a main figure as it places the epitope landscape of nanobodies being investigated in this manuscript into the broader context of clinically approved monoclonal antibody therapeutics for COVID-19.

      We have amended the Figures to accommodate the reviewers suggestion. Figure S1 is now Figure 2.

      (5) Page 6, Lines 173-175: The neutralization breadth for S1-46 is quite encouraging. Any speculations on why this particular nanobody is so broadly targeting? Any additional thoughts on why its high binding affinity (nM) did not translate into strong neutralization (as it is in the 0.1-1 uM range)?

      S1-46 binds a region on spike that is conserved across all variants observed to date. Its epitope is difficult to access unless the RBD is in the up conformation, which may explain why monoclonal antibodies rarely bind. We state this in the text as follows:

      “S1-46 binds a region on spike that is conserved across all variants to date, but which may be relatively inaccessible and is not targeted by any of the mAbs that previously received EUA by the FDA (Cox, Peacock et al. 2023).”

      Relating neutralization activity to binding activity requires more insight into the mechanisms of binding and activity. Nonetheless, we are also encouraged by S1-46’s breadth and numerous avenues can be pursued to greatly improve its neutralizing activity (e.g. synergistic combinations).

      (6) Page 6, Lines 173-175: For the remaining two nanobodies S1-31 and S1-RBD-11 in group VII, the target epitopes on the spike proteins of either delta or BA.1 do not seem to bear any mutations, at least based on the mutation maps in Fig 1B. Yet their neutralizing capacities against delta and BA.1 variants were abolished. Do the authors have any idea about what is going on here?

      For group VII, only the epitope of S1-46 was mapped whereas S1-31 and S1-RBD-11 were assigned to group VII based on our lower resolution binning experiments. Thus, without knowing precisely where they bind, we can make only limited conclusions at this time. In the absence of supporting structural information, we speculate that the epitopes of RBD-11 and S1-31 may be in a region that overlaps with or is in close proximity to a mutation that could affect the binding of the nanobody enough to result in loss of neutralizing ability.

      (7) Page 7, Line 195-200: Please provide PRNT50 or logPRNT50 for the five nanobodies selected for BA.4/5 PRNT assay.

      We have added this suggested information. Additionally, a supporting table (Table S1) is now provided.

      (8) Page 8, Lines 223-224: Similar to comment 3, what was the rationale here for choosing certain nanobodies over others for structural modeling and visualizing the binding heatmap in Fig 2B?

      The set of nanobodies chosen for structural modeling and visualization of neutralization data is identical to the set of anti-RBD nanobodies chosen for binding.

      (9) Page 11, Lines 326-328: Can the authors include mutation maps as part of Fig 4C to show the mutation distributions on the XBB/BQ.1/BQ/1.1 spikes?

      We have updated and added a supplemental figure to accompany Fig. 5 (called “supplement for Figure 5”) showing the mutation maps.

      (10) Page 14, Line 409-418: This paragraph is well considered. Given the large number of nanobodies assessed in this manuscript, it would be helpful if the authors could highlight some candidate nanobodies as lead candidates for further optimization.

      While our intention in this manuscript was not to provide targeted recommendations for lead candidates, but rather to reiterate the collective potential of a Nb pool originally targeted towards the 2019 Wuhan variant, the reviewers point is interesting. We speculate that any of the Nbs we have demonstrated to show pan-VoC activity, would be prime candidates for further optimization.

      We have added a statement to this effect as follows: “We propose that any of the Nbs we have demonstrated to show pan-VoC activity, would be prime candidates for further optimization.”

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) The main message of the article is the prediction that nanobodies that retain binding to the different SARS-CoV-2 variants including early Omicron strains will retain binding and neutralization against currently circulating strains such XBB and BQ. However, no evidence either via modeling or experimental testing has been provided for that prediction. The study will benefit from mapping amino acid mutations in RBD of XBB and BQ lineages compared to BA.4/5 and demonstrating via computation docking that epitopes of the five nanobodies that retain binding to BA.4/5 RBD are not affected. For example, the crystal structure of XBB.1 RBD PDB:8OIV is available. Binding/neutralization experiment with currently circulating SARS-CoV-2 strains would still be the gold standard test given the fact that only five out of 41 nanobodies retained binding and neutralization to BA.4/5 lineage. Loss of neutralization ability against BA.4/5 without a significant decrease in binding affinity for nanobodies S1-46 and S1-RBD-22 further indicates that neutralization of XBB and BQ lineage should be performed.

      The docking protocol used to predict the spike epitopes uses a C-alpha resolution to represent protein residues, and is data-driven, i.e. it assumes that binding happens in the first place, and then utilizes experimentally obtained structural restraints. So, concluding possible binding from such a docking protocol alone would be noisy. In our revised manuscript we have a new Figure 3B, which shows epitopes of 4 out of the 5 pan-VoC nanobodies, i.e. S1-RBD-{9, 22, 40) and S1-46 mapped to the RBD structures of XBB.1 (8IOU) and BQ.1.1 (8FXC), and we have updated Figure 4 with a supplemental showing the mutation maps.

      (2) Described nanobodies are positioned as very potent neutralizers of SARS-CoV-2. However, they are much less potent in neutralization of ancestral strain as well as early VOCs compared to the mAbs that were approved for COVID-19 treatment. For example, IC50 for casirivimab and imdevimab are 37.4 pM and 42.1 pM, respectively. That is about 27-fold more than IC50 for the most potent nanobody reported in the article, S1-RDB-15.

      This comparison is fraught for several reasons. 1. Experimental differences in pseudovirus assay systems usually result in significant differences in reported IC50s, as IC50 is not an absolute measure, or ultimately comparable to clinical IC50 values. For this reason, in our original publication (Mast et al., 2021) we tested other nanobodies in our experimental set-up as benchmarks (Mast et al., 2021). 2. A typical monoclonal has two binding sites with a large structural Fc linker that is combined ~10 times the size of a nanobody. In a therapeutic setting where monoclonal therapy is provided in g per kg of patient body weight, there is a 5-fold excess of Nb binding to antibody binding capacity. 3. We have previously shown that dimerizing our nanobodies (to produce two antigen binding sites) can dramatically increase potency over 100 fold (Mast et al., 2021).

      In order to make this even clearer in the manuscript, we have added the following: “We note that IC50s are not directly comparable across different experimental set-ups because measured values are highly dependent on the experimental conditions. For this reason, we included other published nanobodies as benchmarks in our original publication and have subsequently maintained standard experimental conditions (Mast, Fridy et al. 2021)”.

      (3) Figure 1A. If each dot represents an independent measurement of the same nanobody, IC50 variation seems too high. For some nanobodies it ranges for almost a log of magnitude, e.g S1-RDB-24, S1-RBD-46, S2-3. Why is that?

      We have deliberately explored the full range of effects that could contribute to experimental variability in our pseudovirus assay, using different batches of nanobody and pseudovirus in each replicate to provide as impartial and comprehensive analysis as possible. While the activity of some nanobodies is remarkably stable from batch to batch, others show the variation noticed by the Reviewer, hence why we performed multiple replicates to define the average IC50 value for our nanobodies.

      (4) The drop in IC50 for BA.1 neutralization is about one log for the majority of tested nanobodies. This should be outlined in the text. For example, for the most potent neutralizer, S1-RDB-15, the drop in IC50 for BA.1 is about 100-fold compared to IC50 for the Delta and Wuhan strains. It is important to note that out of 9 nanobodies for that drop in neutralizing capacity against BA.1 and Delta variants less than one log of magnitude 2 have epitopes in the S2 domain of SRS-CoV-2 spike. Resistance of mAbs targeting the S2 part of the spike has been extensively described in the literature as being due to the highly conserved structure of this region that facilitates membrane fusion. Presented data demonstrate that >80% of the nanobody repertoire is affected by mutations on spike protein. Additionally, it can be helpful for readers if the fold-change in IC50 between Wuhan, Delta, and BA.1 is presented in the text or added to Figure 1 or a table.

      We agree with the Reviewer and to make this more explicit we have made the following change: “In comparison, groups I, I/II, I/IV, V, VII, VIII and the anti-S2 nanobodies contained the majority of omicron BA.1 neutralizers, though here the neutralization potency of many nanobodies was generally decreased tenfold compared to wild-type (emphasis added).”

      (5) The authors should either present the results of the formal correlation analysis or avoid using misleading verbiage such as: "the decrease in neutralization potency largely correlates with the accumulation of omicron BA.1 specific mutations throughout the RBD" or "significant decrease in binding affinity correlated to decreases neutralization potency".

      We thank the Reviewer for this constructive feedback. To address this question, we have performed a correlation analysis using Pearson and Spearman's methods to quantitatively assess the relationship between nanobody neutralization potency (IC50) and binding affinity (KD) across SARS-CoV-2 variants, including the wildtype, delta, and omicron BA.1 variants. Our results indicate a statistically significant correlation for the delta variant (Pearson's PCC: 0.71, p-value: 0.01; Spearman's rho: 0.63, p-value: 0.07), supporting our statement regarding the correlation between decreased neutralization potency and reduced binding affinity for this variant. However, for the wildtype and omicron BA.1 variants, the correlations were not statistically significant (wildtype Pearson's: 0.10, p-value: 0.70; omicron BA.1 Pearson's: 0.27, p-value: 0.31), which we acknowledge does not fully align with the verbiage used in the manuscript. Therefore, we have revised the manuscript to present the correlation analysis data accurately and ensure the discussion is reflective of the statistical evidence as follows:

      “SPR binding assessments to the spike S1 domain or RBD of delta revealed a pattern: nanobodies maintaining binding affinity generally also neutralized the virus with a statistically significant correlation between binding affinity and neutralization efficacy (Pearson's Correlation Coefficient: 0.71, p-value: 0.01; Spearman's rho: 0.63, p-value: 0.07). However, this correlation was not statistically significant for omicron BA.1 (Pearson's Correlation Coefficient: 0.27, p-value: 0.31) (Fig. 3A, Table 1). Notably, while some nanobodies bound to the variants, they did not consistently neutralize them, suggesting additional factors influence neutralization beyond mere binding.”

      (6) Figure 3 shows approximated curves for live virus neutralization assay with quite a broad 90% CI. It will be helpful to present, at least, in supplementary, primary data for live-virus neutralization that were used to perform non-linear regression.

      We have added the reviewer’s suggestion.

      (7) It is not clear what are the "variant-specific nanobody groups" exactly? A definition/description of the term is not provided. If the nanobody library was generated with the Wuhan strain, how did strain-specific nanobodies that bind/neutralize only Delta, BA.1 or BA.4/5 appear in the repertoire and were isolated? This statement also contradicts data in Table 4 where all nanobodies listed bind and neutralize Wuhan strain.

      We agree with the reviewer. All nanobodies tested bind/neutralize the Wuhan strain as they were selected from our original repertoire of 116 nanobodies (Mast, et al., 2021). To clarify, variant-specific nanobodies are nanobodies that bind only one variant that arose from the original Wuhan strain. They were categorized into variant-specific groups based on whether they were able to bind each variant (other than Wuhan).

      We have thus added to the manuscript, “we define variant-specific nanobodies as nanobodies that bind a single additional variant alongside the original Wuhan strain...”

      (8) Describing the categorization of nanobody epitope groups presented in Figure 4, the authors state that binding to Wuhan, Delta, BA/1, and BA.4/5 predicts that these nanobodies will be "effective binders against current circulating strains of the virus including XBB and BQ lineages"? How exactly is this conclusion corollary to the data shown?

      The epitopes of XBB and BQ.1 are not divergent enough within the regions we propose the nanobodies to bind, to suggest that nanobodies that bind in those regions will lose binding ability. We hypothesize that the region at which these nanobodies bind represents regions on spike that are vulnerable to our specified nanobodies in Fig. 4. We have generated a new Fig. 3B and added a supporting figure for Fig. 4 to address this.

      (9) Figures 4C and 6 describe how the nanobodies will retain binding to currently circulating strains of XBB lineage. However, epitopes are mapped on the same Wuhan, Delta, BA.1, and BA.4/5 virus strains. The predicted binding of nanobodies to XBB lineage RBD is not actually shown in Figure 6. It is clear from the figure that the nanobody binding footprint (red area) decreases with antigenic distance in every spike projection from Wuhan through the BA.4/5 strain. It is unclear how this indicates that nanobodies will remain active against even more distant XBB, BQ, EU, and CH strains accumulating more mutations in spike protein.

      We have added the following to the manuscript to clarify: “Strikingly, we have in our cohort 8 nanobodies able to bind delta, and the omicron lineages BA.1/BA.4/BA.5/XBB/BQ.1.1 (Fig. 5B). We further predict these 8 nanobodies will be effective binders against current circulating strains of the virus including omicron EG.5 and HV.1 as the epitope regions (or predicted epitopes) of these nanobodies do not vary significantly from omicron lineages XBB and BQ.1.1 (Fig. 5C and Supplement to Fig. 5).”

      (10) Despite major advances in the development of nanobodies as therapeutic molecules there are only a few nanobody-based drugs that have so far been approved for clinical use and all of them are nanobody fusions to immunoglobulin Fc fragment. It is dictated by the small size of the nanobody itself, 15 kDa molecule, that leads to rapid kidney clearance within hours post-injection, and also by the necessity of having antibody effector functions allowing for example killing of malignant cells. It is hard to predict how each individual nanobody will tolerate multimerization and if it will still retain binding ability as its size dramatically increases. It should be noted that IC50 for BA.4/5 is in the submicromolar range for the 5 nanobodies retaining neutralization of this strain. From a therapeutic perspective, this is quite a high IC50 that dictates a high dosage to achieve a therapeutic effect. Furthermore, it can be expected that additional mutations in the SARS-CoV-2 spike will further affect binding affinity and therefore reduce the neutralization ability of these nanobodies resulting in even higher doses required to achieve therapeutic effect. Therefore, authors should discuss the limitations of the nanobody approach as a therapeutic intervention more granularly.

      While Fc fusions are not strictly required for clinical use (for instance Caplacizumab is not an Fc fusion, being a multimer containing an albumin-binding nanobody), we agree that reformulation would indeed be required to optimize pharmacokinetics for eventual clinical use. Increased valency through multimerizeration is in fact one of several strategies, which also includes synergistic combinations, for significantly enhancing effective IC50. Preclinical nanobody engineering is not within the scope of this paper, but we acknowledge this challenge.

      Minor points:

      (1) Table S1 is missing.

      This is an .xlsx file uploaded as Supplementary File 3. Labeled now as “Figure 6–Source data 2. Neutralization data from synergy experiment”.

      (2) Because Table 1 summarizes all neutralization and binding data, it will be helpful to refer to it while describing data presented in Figure 1.

      This has been added to the revised manuscript.

      (3) Live SARS-CoV-2 PRNT is not described in Materials and Methods.

      This has been added to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1:

      Summary:

      The Roco proteins are a family of GTPases characterized by the conserved presence of an ROC-COR tandem domain. How GTP binding alters the structure and activity of Roco proteins remains unclear. In this study, Galicia C et al. took advantage of conformationspecific nanobodies to trap CtRoco, a bacterial Roco, in an active monomeric state and determined its high-resolution structure by cryo-EM. This study, in combination with the previous inactive dimeric CtRoco, revealed the molecular basis of CtRoco activation through GTP-binding and dimer-to-monomer transition.

      Strengths:

      The reviewer is impressed by the authors' deep understanding of the CtRoco protein. Capturing Roco proteins in a GTP-bound state is a major breakthrough in the mechanistic understanding of the activation mechanism of Roco proteins and shows similarity with the activation mechanism of LRRK2, a key molecule in Parkinson's disease. Furthermore, the methodology the authors used in this manuscript - using conformation-specific nanobodies to trap the active conformation, which is otherwise flexible and resistant to single-particle average - is highly valuable and inspiring.

      Weakness:

      Though written with good clarity, the paper will benefit from some clarifications.

      (1) The angular distribution of particles for the 3D reconstructions should be provided (Figure 1 - Sup. 1 & Sup. 2).

      Figure 1 – Figure supplements 1 and 2 now contain particle distribution plots.

      (2) The B-factors for protein and ligand of the model, Map sharpening factor, and molprobity score should be provided (Table 1).

      Table 1 now contains B-factors and molprobity scores.

      The map used to interpret the model was post-processed by density modification, and therefore no data concerning sharpening factors are provided in the output.

      (3) A supplemental Figure to Figure 2B, illustrating how a0-helix interacts with COR-A&LRR before and after GTP binding in atomic details, will be helpful for the readers to understand the critical role of a0-helix during CtRoco activation.

      This is now illustrated in the new Figure 2 – Figure Supplement 1.

      (4) For the following statement, "On the other hand, only relatively small changes are observed in the orientation of the Roc a3 helix. This helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022), is located at the interface of the Roc and CORB domains and harbors the residues H554 and Y558, orthologous to the LRRK2 PD mutation sites N1337 and R1441, respectively." It is not surprising the a3-helix of the ROC domain only has small changes when the ROC domain is aligned (Figure 2E). However, in the study by Zhu et al (DOI: 10.1126/science.adi9926), it was shown that a3-helix has a "see-saw" motion when the COR-B domain is aligned. Is this motion conserved in CtRoco from inactive to active state?

      We indeed describe the conformational changes from the perspective of the Roc domain. When using the COR-B domain for structural alignment, a rotational movement of Roc (including a “seesaw”-like movement of the α3-helix helix around His554) with respect to COR-B is correspondingly observed.

      This is now added to Figure 2E. Additionally, the text was adapted to:

      “Interestingly, this rotational movement of CORB seems to use the H554-Y558-Y804 triad on the interface of Roc and CORB as a pivot point (Figure 2E). Mutation of either of the corresponding residues in LRRK2 (N1437, R1441, Y1699, respectively) is associated with PD and leads to LRRK2 activation. Residues H554 and Y558 are located on the Roc a3 helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022). Indeed, while the orientation of the a3 helix with respect to the rest of the Roc domain only undergoes small changes upon GTPgS binding, it can be observed that this helix undergoes a “seesaw-like” movement with respect to the CORB domain. A similar rearrangement was previously also observed for Rab29-mediated activation of human LRRK2 (Störmer et al., 2023; Zhu et al., 2022).”

      (5) A supplemental figure showing the positions of and distances between NbRoco1 K91 and Roc K443, K583, and K611 would help the following statement. "Also multiple crosslinks between the Nbs and CtRoco, as well as between both nanobodies were found. ... NbRoco1-K69 also forms crosslinks with two lysines within the Roc domain (K583 and K611), and NbRoco1-K91 is crosslinked to K583".

      A figure displaying these crosslinks is now provided as Figure 4–figure supplement 1. However, in interpreting these crosslinks it should be taken into consideration that the additive length of the DSSO spacer and the lysine side chains leads to a theoretical upper limit of ∼26 Å for the distance between the α carbon atoms of cross-linked lysines (and even a cut-off distance of 35 Å when taking into account protein dynamics).

      (6) It would be informative to show the position of CtRoco-L487 in the NF and GTP-bound state and comment on why this mutation favors GTP hydrolysis.

      L487 is located in Switch 1, which is a critical region for nucleotide binding and hydrolysis. Unfortunately, most probably due to flexibility, the Switch 1 region could not be entirely modeled (in neither nucleotide state). Since L487 is located on the edge of the interpretable portion of the Switch 1 in both structures (see Author response image 1 below), any interpretation regarding the role of this residue would be highly speculative.

      Author response image 1.

      The following text was added to the Results section:

      “Also the Switch 1 loop could not be fully modeled in our structure, presumably indicating some flexibility in this region despite the presence of a GTP analogue. Interestingly, the Switch 1 loop harbors the site of the PD-analogous L487A mutation that leads to a stabilization of the CtRoco dimer with a concomitant decrease in GTPase activity (Deyaert et al., 2019). Unfortunately, an exact interpretation of this effect of the L487A mutation is hampered by the lack of a well resolved Switch 1 loop.”

      Reviewer #2:

      Summary

      The manuscript by Galicia et al describes the structure of the bacterial GTPyS-bound CtRoco protein in the presence of nanobodies. The major relevance of this study is in the fact that the CtRoco protein is a homolog of the human LRRK2 protein with mutations that are associated with Parkinson's disease. The structure and activation mechanisms of these proteins are very complex and not well understood. Especially lacking is a structure of the protein in the GTP-bound state. Previously the authors have shown that two conformational nanobodies can be used to bring/stabilize the protein in a monomerGTPyS-bound state. In this manuscript, the authors use these nanobodies to obtain the GTPyS-bound structure and importantly discuss their results in the context of the mammalian LRRK2 activation mechanism and mutations leading to Parkinson's disease. The work is well performed and clearly described. In general, the conclusions on the structure are reasonable and well-discussed in the context of the LRRK2 activation mechanism.

      Strengths:

      The strong points are the innovative use of nanobodies to stabilize the otherwise flexible protein and the new GTPyS-bound structure that helps enormously in understanding the activation cycle of these proteins.

      Weakness:

      The strong point of the use of nanobodies is also a potential weak point; these nanobodies may have induced some conformational changes in a part of the protein that will not be present in a GTPyS-bound protein in the absence of nanobodies.

      Two major points need further attention.

      (1) Several parts of the protein are very flexible during the monomer-dimer activity cycle. This flexibility is crucial for protein function, but obviously hampers structure resolution. Forced experiments to reduce flexibility may allow better structure resolution, but at the same time may impede the activation cycle. Therefore, careful experiments and interpretation are very critical for this type of work. This especially relates to the influence of the nanobodies on the structure that may not occur during the "normal" monomerdimer activation cycle in the absence of the nanobodies (see also point 2). So what is the evidence that the nanobody-bound GTPyS-bound state is biochemically a reliable representative of the "normal" GTP-bound state in the absence of nanobodies, and therefore the obtained structure can be confidentially used to interpret the activation mechanism as done in the manuscript.

      See below for an answer to remark 1 and 2.

      (2) The obtained structure with two nanobodies reveals that the nanobodies NbRoco1 and NbRoco2 bind to parts of the protein by which a dimer is impossible, respectively to a0helix of the linker between Roc-COR and LRR, and to the cavity of the LRR that in the dimer binds to the dimerizing domain CORB. It is likely the open monomer GTP-bound structure is recognized by the nanobodies in the camelid, suggesting that overall the open monomer structure is a true GTP-bound state. However, it is also likely that the binding energy of the nanobody is used to stabilize the monomer structure. It is not automatically obvious that in the details the obtained nonobody-Roco-GTPyS structure will be identical to the "normal" Roco-GTPyS structure. What is the influence of nanobody-binding on the conformation of the domains where they bind; the binding energy may be used to stabilize a conformation that is not present in the absence of the nanobody. For instance, NbRoco1 binds to the a0 helix of the linker; what is here the "normal" active state of the Roco protein, and is e.g. the angle between RocCOR and LRR also rotated by 135 degrees? Furthermore, nanobody NbRoco2 in the LRR domain is expected to stabilize the LRR domain; it may allow a position of the LRR domain relative to the rest of the protein that is not present without nanobody in the LRR domain. I am convinced that the observed open structure is a correct representation of the active state, but many important details have to be supported by e,g, their CX-MS experiments, and in the end probably need confirmation by more structures of other active Roco proteins or confirmation by a more dynamic sampling of the active states by e.g. molecular dynamics or NMR.

      Recently, nanobodies have increasingly been used successfully to obtain structural insights in protein conformational states (reviewed in Uchański et al, Curr. Opin. Struc. Biol. 2020). As reviewer # 2 points out, the concern is sometimes raised that antibodies could distort a protein into non-native conformations. Here, it is important to note that the nanobodies were raised by immunizing a llama with the fully native CtRoco protein bound to a non-hydrolysable GTP analogue, after which the nanobodies were selected by phage display using the same fully native and functional form of the protein. As clearly explained in Manglik et al. Annu Rev Pharmacol Toxicol. 2017, the probability of an in vivo matured nanobody inducing a non-native conformation of the antigen is low, although it is possible that it selects a high-energy, low-population conformation of a dynamic protein. Immature B cells require engagement of displayed antibodies with antigen to proliferate and differentiate during clonal selection. Antibodies that induce non-native conformations of the antigen pay a substantial energetic penalty in this process, and B cell clones displaying such antibodies will have a significantly lower probability of proliferation and differentiation into mature antibody-secreting B lymphocytes. Hence, many recent experiments and observation give credence to the notion that nanobodies bind antigens primarily by conformational selection and not induced fit (e.g. Smirnova et al. PNAS 2015).

      Extrapolated to the case of CtRoco, which is clearly very flexible in its GTP-bound form, this means that the nanobodies are able to trap and stabilize one conformational state that is representative of the “active state” ensemble of the protein. In this respect, it is clear from our experiments (XL-MS, affinity and effect on GTPase activity) that the effects of NbRoco1 and NbRoco2 are additive (or even cooperative), meaning that both nanobodies recognize different features of the same CtRoco “active state”. Correspondingly, the monomeric, elongated “open” conformation is also observed in the structure of CtRoco bound to NbRoco1 only (Figure1 - supplement 2), albeit that this structure still displays more flexibility. The monomerization and conformational changes that we observe and describe in the current paper at high resolution are also in very good agreement with earlier observations for CtRoco in the GTP-bound form in absence of any nanobodies, including negative stain EM (Deyaert et al. Nature Commun, 2017), hydrogen-deuterium exchange experiments (Deyaert et al. Biochem. J. 2019) and native MS (Leemans et al. Biochem J. 2020).

      In the revised manuscript we added the following text to the discussion:

      “To decrease this flexibility, we have now used two previously developed conformationspecific nanobodies (NbRoco1 and NbRoco2) to stabilize the protein in the GTP-state (Leemans et al., 2020), allowing us to solve its structure using cryo-EM (Figure 1). Recently, Nbs have successfully been used to obtain structural insights in the conformational states of a number of highly dynamic proteins (Uchański et al, 2020). These studies established that Nbs bind antigens primarily by conformational selection rather than by induced fit (Manglik et al., 2017; Smirnova et al.,2015). Since NbRoco1 and NbRoco2 were generated by immunization with fully native CtRoco bound to a nonhydrolysable GTP analogue, and subsequently selected by phase display using the same functional protein, it is thus safe to assume that these Nbs bind to and stabilize a relevant conformation that is present within the “active” CtRoco conformational space (Leemans et al., 2020). Moreover, our current structures are also in very good agreement with previous biochemical studies and data from HDX-MS and negative stain EM (Deyaert et al., 2019; Deyaert, Wauters, et al., 2017).”

      Recommendations for the authors:

      Reviewer #1:

      (1) Figure 2C: please label the residues with meshes (switch 2).

      Labels have been added to figure 2C.

      (2) A supplemental figure for the following statement will be helpful "A remarkable feature of the CtRoco dimer structure was the dimer-stabilized orientation of the P-loop, which would hamper direct nucleotide binding on the dimer. Correspondingly, in the current structure, the P-loop changes orientation, allowing GTPgS to bind, although the EM map does not allow unambiguous placement of the entire P-loop. Surprisingly, also the Switch 1 loop could not be fully modeled, which could indicate some flexibility in this region despite the presence of a GTP analog".

      An additional Figure 2–figure supplement 2 has been added to illustrate this.

      (3) A supplemental figure for the following statement will be helpful "A final important observation in the Roc domain concerns the very C-terminal part of Switch 2 (residues 520 to 533), which could not be modeled in our GTP bound structure due to flexibility, while in the nucleotide-free dimer structure this region is structured and located at the interface of the Roc domain with the LRR-Roc linker and CORA. In this way, the conformational changes induced by GTPgS binding could be relayed via the Switch 2 toward the LRR and CORA domains, and vice versa."

      An additional Figure 2–figure supplement 2 has been added to illustrate this.

      (4) A structural comparison of each domain (LRR, ROC, COR) between NF and GTP-bound states will be greatly useful to understand statements in the manuscript, such as "In addition to the Cterminal dimerization part of CORB that becomes unstructured, also other large conformational changes are observed in the CORA and CORB domains of CtRoco upon GTPgS binding."

      We would like to clarify that with this statement we refer to changes in the relative orientation of the domains between the nucleotide-free and GTPgS-bound states, rather than to conformational changes within each domain. These changes in relative orientation are illustrated in Figure 2 and the associated Figure supplements.

      (5) The statement "to a lesser extent, also between CDR1 and the LRR-Roc linker" is not clearlyillustrated in Figure 3B.

      The reviewer is correct, and we now also show CDR1 in Figure 3B.

      (6) Extra panels can be added in Figure 1 Sup. 4 to illustrate the following statement "In the density map NbRoco2 can easily be identified and placed on the concave side of the LRR domain... Nterminal and C-terminal b-strands interacting with the very C-terminal repeat of the LRR".

      We belief the density map corresponding to NbRoco2 is clearly shown in Figure 1 – supplement 4A. A reference to this figure panel is now added to the main text.

      (7) "In the presence of both Nbs, the hydrolysis rate was increased 4-fold compared to CtRocoL487A alone and 2-fold compared to CtRoco-L487A in the presence of NbRoco1 only, again illustrating a collaboration between the Nbs (Figure 5C)" Here, is it 6-fold instead of 4-fold?

      The reviewer is correct. We changed this accordingly in the manuscript.

      Reviewer #2:

      (1) At many places in the manuscript the lack of structural details is explained by the assumed local flexibility of the protein. This may be true for many cases (such as linker regions), but is probably not always correct; several other explanations are possible to get no local structural details.

      See our answer to point 2, below.

      (2) At several other places in the manuscript the high flexibility is used to explain the lack of structural details (so the reasoning is reversed compared to point 1); this would require that a priori it is known that that the region is flexible and therefore no structure can be expected. An example is found mid-page 8: "A final important observation in the Roc domain concerns the very C-terminal part of Switch 2 (residues 520 to 533), which could not be modeled in our GTP bound structure due to flexibility, while in the nucleotide-free dimer structure this region is structured and located at the interface of the Roc domain with the LRR-Roc linker and CORA." As written there must be a reference to experiments showing the "due to flexibility"

      The reviewer is correct that additional factors might affect the interpretability of the map, such as the small size of the regions used for the focused refinements (around 50 kDa each) or a preferential distribution of orientation of the particles in the grid. Particle distribution plots are now shown in Figure 1 – Figure supplements 1 and 2. However, due to the intrinsic flexible nature of the Switch 1 and Switch 2 regions, we assume this flexibility to be the major cause of lack of features in the EM maps, especially since some of the neighboring regions display well-resolved maps.

      Nevertheless, in the manuscript we reworded our statements to be more careful. For example, on page 8:

      “Also the Switch 1 loop could not be fully modeled in our structure, presumably indicating some flexibility in this region despite the presence of a GTP analogue.”

      “… potentially due to flexibility of this region in the new position of the Switch 2…”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The aim of the present work is to evaluate the role of BMP9 and BMP10 in liver by depleting Bmp9 and Bmp10 from the main liver cell types (endothelial cells (EC), hepatic stellate cells (HSC), Kupffer cells (KC) and hepatocytes (H)) using cell-specific cre recombinases. They show that HSCs are the main source of BMP9 and BMP10 in the liver. Using transgenic ALK1 reporter mice, they show that ALK1, the high affinity type 1 receptor for BMP9 and BMP10, is expressed on KC and EC. They have also performed bulk RNAseq analyses on whole liver, and cell-sorted EC and KC, and showed that loss of Bmp9 and Bmp10 decreased KC signature and that KC are replaced by monocyte-derived macrophages. EC derived from these Bmp9fl/flBmp10fl/flLratCre mice also lost their identity and transdifferentiated into continuous ECs. Liver iron metabolism and metabolic zonation were also affected in these mice. In conclusion, this work supports that BMP9 and BMP10 produced by HSC play a central role in mediating liver cell-cell crosstalk and liver homeostasis.

      We appreciate the comprehensive summary of reviewer 1.

      Strengths:

      This work further supports the role of BMP9 and BMP10 in liver homeostasis. Using a specific HSC-Cre recombinase, the authors show for the first time that it is the BMP9 and BMP10 produced by HSC that play a central role in mediating liver cell-cell crosstalk to maintain a healthy liver. Although the overall message of the key role of BMP9 in liver homeostasis has been described by several groups, the role of hepatic BMP10 has not been studied before. Thus, one of the novelties of this work is to have used liver cell specific Cre recombinase to delete hepatic Bmp9 and Bmp10. The second novelty is the demonstration of the role of BMP9 and BMP10 in KC Differentiation/homeostasis which has already been slightly addressed by this group by knocking out ALK1, the high affinity receptor of BMP9 and BMP10 (Zhao et al. JCI, 2022).

      We appreciate the positive comment of reviewer 1.

      Weaknesses:

      This work remains rather descriptive and the molecular mechanisms are barely touched upon and could have been more explored. Some references should be added; In particular, a work that has already demonstrated, using a different approach (in situ hybridization RNAscope), that in the liver BMP9 and BMP10 are expressed by HSC (Tillet et al., J Biol Chem 2018). Another publication (Bouvard et al., Cardiovasc Res, 2021) has previously showed that deletion of Bmp9 and Bmp10 leads to liver fibrosis and could have thus been cited. There is also a reference that is not correctly cited. Ref 26 (Herrera et al., 2014) does not say that "BMP10 is mostly expressed in the heart, followed by the liver" or that "BMP9 and BMP10 also bind to ALK2" as cited in the manuscript.

      We agree with the comment of reviewer 1 that the molecular mechanisms were barely investigated in our work. Indeed, it has been reported that BMP9/10 induce the expression of ID1/3 in KCs and GATA4 and Maf in liver ECs in vitro culture system. These master regulators play an important role in the differentiation of the two cell types. Thus, we think that the reduced expression of these master regulators can explain the phenotype in KCs and ECs observed in Bmp9fl/flBmp10fl/flLratCre mice. In addition, according to the reviewer’s suggestion, these references will be added or corrected in our revised manuscript.

      The gating strategies for cell sorting which is used for bulk RNAseq and FACS analyses should be better described in order to better follow the manuscript. This point is particularly important for KC gating as the authors show that Tim4 is very strongly decreased in Bmp9fl/flBmp10fl/flLratCre (Fig 2c), yet, it seems that this marker is used for gating macrophages (Suppl fig4). Same question with F4/80 which is strongly decreased in Bmp9fl/flBmp10fl/flLratCre (Fig 2d) and also used for gating. It is important to show the gating strategy for both Control and Bmp9fl/flBmp10fl/flLratCre mice.

      The authors should explain how they selected the genes shown on each heatmaps and add references that can justify the choice of the genes.

      Thank you for your suggestion. In our study, we used CD45+ Ly6C- F4/80+ CD64+ cells to define liver macrophages. We will delete Tim4 FACS plot from Suppl fig4 to avoid the misunderstanding. Although F4/80 positive cells were reduced in the livers of Bmp9fl/flBmp10fl/flLratCre mice, double staining by anti-F4/80 and anti-CD64 fluorescence antibodies can still clearly distinguish liver macrophages based on above gating strategy. Gating strategy for both control and Bmp9fl/flBmp10fl/flLratCre mice will be presented in our revised manuscript.

      Quantifications of Immunostaining and FACS data should be added as well as statistical analyses.

      Quantitative data will be added in our revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the contribution of BMP9/BMP10 expression/secretion from all different hepatic cell types and analysed their impact on the other cell types. They are able to show that HSC derived BMP9/BMP10 controls Kupffer cell and EC differentiation and functions.

      We appreciate the comprehensive summary of reviewer 2.

      Strengths:

      This is the first study to my knowledge to comprehensively analyze the contribution of BMP9/BMP10 expression in such systematic fashion in vivo. This study therefore is a significant contribution to the field and further supports previous studies that have already implied BMP9 and BMP10 in Kupffer cell and EC functions but did not unravel the intercellular cross talk in such detailed fashion.

      We appreciate the positive comment of reviewer 2.

      Weaknesses:

      Several findings such as the impact of BMP9/10 on Kupffer cells and EC were already known. So these findings are not innovative, however I still believe that the elucidation of the cellular crosstalk makes this publication highly interesting to a broad scientific community.

      Overall the authors achieved their aims and the results are well supporting the conclusions and discussion.

      We appreciate the positive comment of reviewer 2. We agree with the comment of reviewer 2 that although some findings in our paper are somehow expected, the detailed investigation of the crosstalk between different liver cell types is still needed and beneficial to this field.

  2. Apr 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Valk and Engert et al. examined the potential relations between three different mental training modules, hippocampal structure and functional connectivity, and cortisol levels over a 9-month period. They found that among the three types of mental training: Presence (attention and introspective awareness), Affect (socio-emotional - compassion and prosocial motivation), and Perspective (socio-cognitive - metacognition and perspective taking) modules; Affect training most consistently related to changes in hippocampal structure and function - specifically, CA1-3 subfields of the hippocampus. Moreover, decreases in diurnal cortisol correlated to bilateral increases in volume, and decreases in diurnal and chronic cortisol left CA1-3 functional connectivity. Chronic cortisol levels also related to right CA4/DG volume and left subiculum function. The authors demonstrate that mindfulness training programs impact hippocampus and are a potential avenue for stress interventions, a potential avenue to improve health. The data contribute to the literature on plasticity of hippocampal subfields during adulthood, the impact of mental training interventions on the brain, and the link between CA1-3 and both short- and long-term stress changes. Additional clarification and extension of the methods is needed to strengthen the authors' conclusions.

      We thank the Reviewer for their positive evaluation and summary of our findings and work. We made additional changes as suggested by the Reviewer and hope this clarified any open points.

      (1) The authors thoughtfully approached the study of hippocampal subfields, utilizing a method designed for T1w images that outperformed Freesurfer 5.3 and that produced comparable results to an earlier version of ASHS. However, given the use of normalized T1-weighted images to delineate hippocampal subfield volume, some caution may be warranted (Wisse et al. 2020). While the authors note the assessment of quality control processes, the difficulty in ensuring valid measurement is an ongoing conversation in the literature. This also extends to the impact of functional co-registration using segmentations. I appreciate the inclusion of Table 5 in documenting reasons for missing data across subjects. Providing additional details on the distribution of quality ratings across subfields would help contextualize the results and ensure there is equal quality of segmentations across subfields.

      We thank the Reviewer for bringing up this point. In the current work, we assessed the overall segmentation of all six subfields per individual. Thus, unfortunately, we have no data of quality of segmentation of individual subfields beyond our holistic assessment. Indeed, registration of hippocampal subfields remains a challenge and we have further highlighted this limitation in the Discussion of the current work.

      “It is of note that the current work relies on a segmentation approach of hippocampal subfields including projection to MNI template space, an implicit correction for total brain volume through the use of a stereotaxic reference frame. Some caution for this method may be warranted, as complex hippocampal anatomy can in some cases lead to over- as well as underestimation of subfield volumes, as well as subfield boundaries may not always be clearly demarcated (1). Future work, studying the hippocampal surface at higher granularity, for example though unfolding the hippocampal sheet (2-5), may further help with both alignment and identification of not only subfield-specific change but also alterations as a function of the hippocampal long axis, a key dimension of hippocampal structural and functional variation that was not assessed in the current work (6, 7).”

      (2) Given the consistent pattern of finding results with CA1-3, in contrast to other subfields, it would help to know if the effects of the different training modules on subfields differed from each other statistically (i.e., not just that one is significant, and one is not) to provide an additional context of the strength of results focused on Affect training and CA1-3 (for example, those shown in Figure 3).

      Our work investigated i) whether the effects of the individual Training Modules differed from each other statistically. We found that the Affect Training Module showed increases in CA1-3 volume, and that these increases remained when testing effects relative to changes in this subfield following Perspective training and in retest controls. Moreover, in CA1-3 we found changes in functional connectivity when comparing the Affect to Perspective training Module. These changes were only present in this contrast, but not significant in each of the Training Modules per se. To test for specificity, we additionally evaluated whether subfield-specific changes were present above and beyond changes in the other ipsilateral hippocampal subfields. Relative to other subfields, right CA1-3 showed increases in the Affect vs Perspective contrast (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015). No other subfield showed significant changes. We now include this statement in the revised Results and Supplementary Tables.

      “Moreover, associations between CA1-3 and Affect, relative to Perspective, seemed to go largely above and beyond changes in the other subfields (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015, see further Supplementary File 1h).”

      Author response table 1.

      Subfield-specific changes following the Training Modules, controlling for the other two ipsilateral subfields

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1, using different colors for subfields versus the modules (yellow, red, green) would help as it could lead the reader to try to draw connections between the two when it is namely a depiction of the delineations.

      As suggested, we updated Figure 1 accordingly and present the subfields in different shades of purple for clarity. Please find the updated figure below.

      Author response image 1.

      (2) In the Results, it was at times hard to follow when Affect off Perspective where the focus of the results. Perhaps the authors could restructure or add additional context for clarity.

      We are happy to clarify. For the first analysis on Module-specific changes in hippocampal subfield volume, we compared effects across Training Modules. Here, main contrasts were ran between subjects: Presence vs active control and within subjects: Affect versus Perspective. In additional secondary contrasts, we studied training effects vs retest control. After observing consistent increases in bilateral CA1-3 following Affect, in the following analysis, we evaluated 1) intrinsic functional networks in main and supplementary contrasts and 2) diurnal cortisol measures within the Training modules only and all three Training Modules combined, and also adopted 3) a multivariate approach (PLS) (see comments Reviewer 2). We now also report effects of cortisol change on structural and functional subfield change in Presence and Perspective, for additional completeness and clarity.

      “To study whether there was any training module-specific change in hippocampal subfield volumes following mental training, we compared training effects between all three Training Modules (Presence, Affect, and Perspective). Main contrasts were: Presence vs Active control (between subjects) and Affect vs Perspective (within subjects). Supplementary comparisons were made vs retest controls and within training groups.”

      “Overall, for all hippocampal subfields, findings associated with volume increases in CA1-3 fol-lowing the Affect training were most consistent across timepoints and contrasts (Supplementary File 1a-f).”

      “Subsequently, we studied whether hippocampal CA1-3 would show corresponding changes in intrinsic function following the Affect mental training.”

      “In particular, the moderately consistent CA1-3 volume increases following Affect training were complemented with differential functional connectivity alterations of this subfield when comparing Affect to Perspective training”

      “Last, we probed whether group-level changes in hippocampal subfield CA1-3 volume would correlate with individual-level changes in diurnal cortisol indices (Presence: n= 86; Affect: n=92; Perspective: n=81), given that the hippocampal formation is a nexus of the HPA-axis (8). We took a two-step approach. First, we studied associations between cortisol and subfield change, particularly focusing on the Affect module and CA1-3 volume based on increases in CA1-3 volume identified in our group-level analysis.”

      “We observed that increases in bilateral CA1-3 following Affect showed a negative association with change in total diurnal cortisol output […]”

      “We did not observe alterations in CA1-3 volume in relation to change in cortisol markers in Presence or Perspective. Yet, for Presence, we observed association between slope and LCA4/DG change (t=-2.89, p=0.005, q=0.03), (Supplementary File 1uv).”

      “In case of intrinsic function, we also did not observe alterations in CA1-3 in relation to change in cortisol markers in Presence or Perspective, nor in other subfields (Supplementary File 1wx).”

      Author response table 2.

      Correlating change in subfield volume and diurnal cortisol indices in Presence. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold.

      Author response table 3.

      Correlating change in subfield volume and diurnal cortisol indices in Perspective. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold.

      Author response table 4.

      Association between stress-markers and within functional network sub-regions in Affect and Perspective.

      Author response table 5.

      Correlating change in subfield function and diurnal cortisol indices in Presence. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold. For these multiple comparisons (FDRq, corrected for two subfields) values are reported if uncorrected p values are below p<.05.

      Author response table 6.

      Correlating change in subfield function and diurnal cortisol indices in Perspective. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold. For these multiple comparisons (FDRq, corrected for two subfields) values are reported if uncorrected p values are below p<.05.

      (3) In the Methods, the authors note that corrections for multiple comparisons were used where needed, throughout the manuscript there is some switching between corrected and uncorrected p-values. At times, this made it difficult to follow in terms of when these corrections were needed.

      For clarity, we added explicit multiple comparisons information a) in main and supplementary results, and b) wherever extra information was needed. Also, we only included main contrasts in Table 1-3 to avoid confusion and moved the information on changes in SUB and CA4/DG to the Supplementary tables.

      (4) Typically, when correcting for intracranial volume the purpose is the ensure that sexual dimorphism in the size of the brain is accounted for. I would recommend the authors assess whether sex differences are accounted for by the MNI normalization approach taken. In the reading of the original Methods paper for the patch-based algorithm used, ICV was used to transform to MNI152 space. It would help to have additional information on how the normalization was done in the current study in order to draw comparisons to other findings in the literature.

      We are happy to further clarify. In the current work, we used the same approach as in the original paper. Volumes were linearly registered to the MNI template using FSL flirt. We now provided this additional information in the revised methods.

      “Hippocampal volumes were estimated based on T1w data that were linearly registered to MNI152 using FSL flirt (http://www.fmrib.ox.ac.uk/fsl/), such that intracranial volume was implicitly controlled for.”

      We agree with the Reviewer that sex differences may still be present, and investigated this. At baseline, sex differences were found in all subfields in the left hemisphere, and right CA4/DG (FDRq<0.05). Regressing out ICV resolved remaining sex differences. We then evaluated whether main results of volumetric subfield change were impacted by ICV differences. Differences between Affect and Perspective remained stable. We have now added this additional analysis in the Supplementary Materials.

      “Although stereotaxic normalization to MNI space would in theory account for global sex differences in intra-cranial volume, we still observed sex differences in various subfield volumes at baseline. Yet, accounting for ICV did not impact our main results suggesting changes in CA1-3 following Affect were robust to sex differences in overall brain volume (Supplementary File1j).”

      Author response table 7.

      Sex differences (female versus male) in hippocampal subfield volumes.

      Reviewer #2 (Public Review):

      In this study, Valk, Engert et al. investigated effects of stress-reducing behavioral intervention on hippocampal structure and function across different conditions of mental training and in relation to diurnal and chronic cortisol levels. The authors provide convincing multimodal evidence of a link between hippocampal integrity and stress regulation, showing changes in both volume and intrinsic functional connectivity, as measured by resting-state fMRI, in hippocampal subfield CA1-3 after socio-affective training as compared to training in a socio-cognitive module. In particular, increased CA1-3 volume following socio-affective training overlapped with increased functional connectivity to medial prefrontal cortex, and reductions in cortisol. The conclusions of this paper are well supported by the data, although some aspects of the data analysis would benefit from being clarified and extended.

      A main strength of the study is the rigorous design of the behavioral intervention, including test-retest cohorts, an active control group, and a previously established training paradigm, contributing to an overall high quality of included data. Similarly, systematic quality checking of hippocampal subfield segmentations contributes to a reliable foundation for structural and functional investigations.

      We thank the Reviewer for the thoughtful summary and appreciation of our work, as well as requests for further clarification and analyses. We addressed each of them in a point by point fashion below.

      Another strength of the study is the multimodal data, including both structural and functional markers of hippocampal integrity as well as both diurnal and chronic estimates of cortisol levels.

      (1) However, the included analyses are not optimally suited for elucidating multivariate interrelationships between these measures. Instead, effects of training on structure and function, and their links to cortisol, are largely characterized separately from each other. This results in the overall interpretation of results, and conclusions, being dependent on a large number of separate associations. Adopting multivariate approaches would better target the question of whether there is cortisol-related structural and functional plasticity in the hippocampus after mental training aimed at reducing stress.

      We thank the Reviewer for this suggestion. Indeed, our project combined different univariate analyses to uncover the association between hippocampal subfield structure, function, and cortisol markers. While systematic, a downside of this approach is indeed that interpretation of our results depend on a large number of analyses. To further explore the question whether there is cortisol-related structural and functional plasticity in the hippocampus, we followed the Reviewer’s suggestion and additionally adopted a multivariate partial least squares (PLS) model. We ran two complementary models. One focusing on the bilateral CA1-3, as this region showed increases in volume following Affect training and differential change between Affect and Perspective training in our resting state analyses and one model including all subfields. Both models included all stress markers. We found that both models could significantly relate stress markers to brain measures, and that in particular Affect showed strong associations with significant the latent markers. Both analyses showed inverse effects of structure and function in relation to stress markers and both slope and AUC changes showed strongest loadings. We now include these analyses the revised manuscript.

      Abstract

      “Of note, using a multivariate approach we found that other subfields, showing no group-level changes, also contributed to alterations in cortisol levels, suggesting circuit-level alterations within the hippocampal formation.”

      Methods

      “Partial least squares analysis

      To assess potential relationships between cortisol change and hippocampal subfield volume and functional change, we performed a partial least squares analysis (PLS) (9, 10). PLS is a multivariate associative model that to optimizes the covariance between two matrices, by generating latent components (LCs), which are optimal linear combinations of the original matrices (9, 10). In our study, we utilized PLS to analyze the relationships between change in volume and intrinsic function of hippocampal subfields and diurnal cortisol measures. Here we included all Training Modules and regressed out effects of age, sex, and random effects of subject on the brain measures before conducting the PLS analysis. The PLS process involves data normalization within training groups, cross-covariance, and singular value decomposition. Subsequently, subfield and behavioral scores are computed, and permutation testing (1000 iterations) is conducted to evaluate the significance of each latent factor solution (FDR corrected). We report then the correlation of the individual hippocampal and cortisol markers with the latent factors. To estimate confidence intervals for these correlations, we applied a bootstrapping procedure that generated 100 samples with replacement from subjects’ RSFC and behavioral data.”

      Results

      “Last, to further explore the question whether there is concordant cortisol-related structural and functional plasticity in the hippocampus we adopted a multivariate partial least square approach, with 1000 permutations to account for stability (9, 10) and bootstrapping (100 times) with replacement. We ran two complementary models including all Training Modules whilst regressing out age, sex and random effects of subject. First, we focused on the bilateral CA1-3, as this region showed increases in volume following Affect training and differential change between Affect and Perspective training in our resting state analyses. In the second model included structural and functional data of all subfields. Both models included all stress markers. We found that both models could identify significant associations between cortisol stress markers and hippocampal plasticity (FDRq<0.05), and that in particular Affect showed strongest associations with the latent markers for CA1-3 (Table 5). Both analyses showed inverse effects of subfield structure and function in relation to stress markers and both slope and AUC changes showed strongest associations with the latent factor.”

      Author response table 8.

      Multivariate PLS analyses linking cortisol markers to hippocampal subfield volume and function.

      Discussion

      “Last, performing multivariate analysis, we again observed associations between CA1-3 volume and function plasticity and stress change, strongest in Affect. Yet combining all subfields in a single model indicated that other subfields also link to stress alterations, indicating that ultimately circuit-level alterations within the hippocampal formation relate to latent changes in diurnal stress markers across Training Modules.”

      “This interpretation is also supported by our multivariate observations.”

      “In line with our observations in univariate analysis, we found multivariate associations between hippocampal subfield volume, intrinsic function and cortisol markers. Again, the contribution of volume and intrinsic function was inverse. This may possibly relate to the averaging procedure of the functional networks. Combined, outcomes of our univariate and multivariate analyses point to an association between change in hippocampal subfields and stress markers, and that these changes, at the level of the individual, ultimately reflect complex interactions within and across hippocampal subfields and may capture different aspects of diurnal stress. Future work may more comprehensively study the plasticity of the hippocampal structure, and link this to intrinsic functional change and cortisol to gain full insights in the specificity and system-level interplay across subfields, for example using more detailed hippocampal models (3). Incorporating further multivariate, computational, models is needed to further unpack and investigate the complex and nuanced association between hippocampal structure and function, in particular in relation to subfield plasticity and short and long-term stress markers.”

      “…based on univariate analysis. Our multivariate analysis further nuanced this observation, but again pointed to an overall association between hippocampal subfield changes and cortisol changes, but this time more at a systems level.”

      “Lastly, our multivariate analyses also point to a circuit level understanding of latent diurnal stress scores.”

      Author response image 2.

      Multivariate associations between changes in structure and function of hippocampal subfield volume and markers of stress change in Affect. A) Multivariate associations between bilateral CA1-3 volume and intrinsic function and stress markers. Left: Scatter of loadings, colored by Training Module; Right upper: individual correlations of stress markers; Right lower: individual correlation of subfields; B). Multivariate associations between all subfields’ volume and intrinsic function and stress markers. Left: Scatter of loadings, colored by Training Module; Right upper: individual correlations of stress markers; Right lower: individual correlation of subfields.

      (2) The authors emphasize a link between hippocampal subfield CA1-3 and stress regulation, and indeed, multiple lines of evidence converge to highlight a most consistent role of CA1-3. There are, however, some aspects of the results that limit the robustness of this conclusion. First, formal comparisons between subfields are incomplete, making it difficult to judge whether the CA1-3, to a greater degree than other subfields, display effects of training.

      We thank the Reviewer for this comment. To further test for specificity, we additionally evaluated subfield-specific changes relative to other subfields for our main contrasts (Presence versus Active Control and Affect versus Perspective). Relative to other subfields, right CA1-3 showed increases in the Affect vs Perspective contrast (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015); no other subfield showed significant changes. We now include this statement in Results and Supplementary Tables.

      “Moreover, associations between CA1-3 and Affect, relative to Perspective, seemed to go largely above and beyond changes in the other subfields (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015, see further Supplementary File 1h).”

      Author response table 9.

      Subfield-specific changes following the Training Modules, controlling for the other two ipsilateral subfields

      (3) Relatedly, it would be of interest to assess whether changes in CA1-3 make a significant contribution to explaining the link between hippocampal integrity and cortisol, as compared to structure and functional connectivity of the whole hippocampus.

      We thank the Reviewer for this comment. Please see the PLS analysis performed above (R2Q1). Indeed, not only CA1-3 but also other subfields seem to show a relationship with cortisol, in line with circuit level accounts on stress regulation and hippocampal circuit alterations (8, 11-15).

      (4) Second, both structural and functional effects (although functional to a greater degree), were most pronounced in the specific comparison of "Affect" and "Perspective" training conditions, possibly limiting the study's ability to inform general principles of hippocampal stress-regulation.

      We agree with the Reviewer that the association between stress and hippocampal plasticity, on the one hand, and mental training and hippocampal plasticity, on the other hand, make it not very straightforward to inform general principles on hippocampal stress regulation. However, as underscored in the discussion, in previous work we could also link mental training to stress reductions(16-18). We hope that the additional analyses and explanations further explain the multilevel insights of the current work, on the one hand using group-level analysis to investigate and illustrate the association between mental training and hippocampal subfield volume and intrinsic function, and on the other hand using individual level analysis to unpack the association between cortisol change and hippocampal subfield change.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the Results, the description of how the hippocampal subfields' functional networks were defined would benefit from some clarification. It is also somewhat unclear what is meant by (on page 10): "Evaluating functional connectivity changes, we found that connectivity of the right CA1-3 functional network showed differential changes when comparing Affect training to Perspective training (2.420, p=0.016, FDRq=0.032, Cohens D =0.289), but not versus retest control (Table 1 and Supplementary Table 8-14)." Were there significant changes in CA1-3 FC following both training conditions (but these differed from each other)? A description of what this difference reflected would increase the reader's understanding.

      We are happy to clarify. We included information of change of individual modules in the Supplementary materials, Supplementary Table 1 and 2, 9 and 10. Changes for functional connectivity were largely due to the differences in Modules, but did not show strong effects in one Module alone. We now include information on Affect and Perspective un-contrasted change in the main results text:

      “… which could be attributed to decreases in right CA1-3 mean FC following Perspective (t=-2.012, p=0.045, M:-0.024, std: 0.081, CI [-0.041 -0.006]), but not Affect (t=1.691, p=0.092, M: 0.010, std: 0.098, CI [-0.01 0.031]); changes were not present when comparing Affect training versus retest control (Table 1 and Supplementary File 1k-q).”

      (2) As described in the Public Review, the lack of multivariate assessments may risk selling the data short. Including analyses of concomitant functional and structural changes, in relation to cortisol, seems like an approach better adapted to characterize meaningful interrelationships between these measures.

      We thank the Reviewer for suggesting multivariate assessments. To understand the interrelation between behavioral intervention, hippocampal plasticity, and cortisol changes, the current work first evaluates a simpler operationalization of the relationship between hippocampal subfield structure and volume, and cortisol as a function of mental training. Thus, given the complex nature of the study, we initially opted for a model where we assess structural and functional changes independently, with structural changes as the basis of our investigations. Now we have also included a multivariate approach (PLS) to further test the association between hippocampal subfields and cortisol markers, please see our additions to the manuscript above. We now highlighted multivariate associations in the Discussion as well, and suggest this as an important next step for more detailed, future investigations.

      “Incorporating further multivariate, computational, models is needed to further unpack and investigate the complex and nuanced association between hippocampal structure and function, in particular in relation to subfield plasticity and short and long-term stress markers.”

      (3) A minor comment regards the Figures. Some main effects should be visualized in a clearer manner. For instance, the scatterplots in Figure 1, panel D. Also, some of the current headings within the figures could be made more intuitive to the reader.

      We thank the Reviewer for this comment. To improve clarity, we updated figure headings. For Figure 1D, the challenge is that the data are quite scattered and we aimed to visualize our observations in a naturalistic way. Therefore, we added additional y-axis information to further clarify the figures. Creating more overlap or differentiation would make other elements of the figure less clear, hence we remained with the current set-up detailing the intra- and inter-individual alterations of the current model.

      (1) Wisse LEM, Chetelat G, Daugherty AM, de Flores R, la Joie R, Mueller SG, et al. (2021): Hippocampal subfield volumetry from structural isotropic 1 mm(3) MRI scans: A note of caution. Hum Brain Mapp. 42:539-550.

      (2) DeKraker J, Kohler S, Khan AR (2021): Surface-based hippocampal subfield segmentation. Trends Neurosci. 44:856-863.

      (3) DeKraker J, Haast RAM, Yousif MD, Karat B, Lau JC, Kohler S, et al. (2022): Automated hippocampal unfolding for morphometry and subfield segmentation with HippUnfold. Elife. 11.

      (4) Vos de Wael R, Lariviere S, Caldairou B, Hong SJ, Margulies DS, Jefferies E, et al. (2018): Anatomical and microstructural determinants of hippocampal subfield functional connectome embedding. Proc Natl Acad Sci U S A. 115:10154-10159.

      (5) Bernhardt BC, Bernasconi A, Liu M, Hong SJ, Caldairou B, Goubran M, et al. (2016): The spectrum of structural and functional imaging abnormalities in temporal lobe epilepsy. Ann Neurol. 80:142-153.

      (6) Vogel JW, La Joie R, Grothe MJ, Diaz-Papkovich A, Doyle A, Vachon-Presseau E, et al. (2020): A molecular gradient along the longitudinal axis of the human hippocampus informs large-scale behavioral systems. Nat Commun. 11:960.

      (7) Genon S, Bernhardt BC, La Joie R, Amunts K, Eickhoff SB (2021): The many dimensions of human hippocampal organization and (dys)function. Trends Neurosci. 44:977-989.

      (8) McEwen BS (1999): Stress and hippocampal plasticity. Annu Rev Neurosci. 22:105-122.

      (9) Kebets V, Holmes AJ, Orban C, Tang S, Li J, Sun N, et al. (2019): Somatosensory-Motor Dysconnectivity Spans Multiple Transdiagnostic Dimensions of Psychopathology. Biol Psychiatry. 86:779-791.

      (10) McIntosh AR, Lobaugh NJ (2004): Partial least squares analysis of neuroimaging data: applications and advances. Neuroimage. 23 Suppl 1:S250-263.

      (11) Paquola C, Benkarim O, DeKraker J, Lariviere S, Frassle S, Royer J, et al. (2020): Convergence of cortical types and functional motifs in the human mesiotemporal lobe. Elife. 9.

      (12) DeKraker J, Ferko KM, Lau JC, Kohler S, Khan AR (2018): Unfolding the hippocampus: An intrinsic coordinate system for subfield segmentations and quantitative mapping. Neuroimage. 167:408-418.

      (13) McEwen BS, Nasca C, Gray JD (2016): Stress Effects on Neuronal Structure: Hippocampus, Amygdala, and Prefrontal Cortex. Neuropsychopharmacology. 41:3-23.

      (14) Sapolsky RM (2000): Glucocorticoids and hippocampal atrophy in neuropsychiatric disorders. Arch Gen Psychiatry. 57:925-935.

      (15) Jacobson L, Sapolsky R (1991): The role of the hippocampus in feedback regulation of the hypothalamic-pituitary-adrenocortical axis. Endocr Rev. 12:118-134.

      (16) Engert V, Hoehne K, Singer T (2023): Specific reduction in the cortisol awakening response after socio-affective mental training. Mindfulness.

      (17) Puhlmann LMC, Vrticka P, Linz R, Stalder T, Kirschbaum C, Engert V, et al. (2021): Contemplative Mental Training Reduces Hair Glucocorticoid Levels in a Randomized Clinical Trial. Psychosom Med. 83:894-905.

      (18) Engert V, Kok BE, Papassotiriou I, Chrousos GP, Singer T (2017): Specific reduction in cortisol stress reactivity after social but not attention-based mental training. Sci Adv. 3:e1700495.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Responses to Reviewer 1:

      It wouldn't be very surprising to identify the association between PhenoAgeAccel and cancer risk, since the PhenoAgeAccel was constructed as a predictor for mortality which attributed a lot to cancer. Although cancer is an essential mediator for the association, sensitivity analyses using cancer-free mortality may provide an additional angle.

      As suggested, we retrained the PhenoAge in cancer-free participants based on mortality and recalculated PhenoAgeAccel in the UK Biobank. As expected, the re-calculated PhenoAgeAccel was still significantly associated with an increased risk of overall cancer in both men and women. The relevant results have been added to Appendix 1-table6.

      It would be interesting to see, to what extent, PhenoAgeAccel could be reversed by environmental or lifestyle factors. G by E for PhenoAgeAccel might be worth a try.

      As suggested, we performed interaction analysis between genetic and lifestyle factors on PhenoAgeAccel, and added the methods and results in the revision as follows:

      “55 independent PhenoAgeAccel-associated SNPs (P < 5 × 10-8) and corresponding effect sizes were derived from a large-scale PhenoAgeAccel GWAS including 107,460 individuals of European ancestry (Kuo, Pilling, Liu, Atkins, & Levine, 2021). A PhenoAgeAccel PRS was created using an additive model as previously described (Dai et al., 2019). In short, the genotype dosage of each risk allele for each individual was summed after multiplying by its respective effect size of PhenoAgeAccel.” (Page 6)

      “We performed additive interaction analysis between genetic risk (defined by CPRS) and PhenoAgeAccel on overall cancer risk, as well as genetic risk (defined by PhenoAgeAccel PRS) and lifestyle on PhenoAgeAccel using two indexes: the relative excess risk due to interaction (RERI) and the attributable proportion due to interaction (AP).” (Page 9)

      “However, we did not observe any interaction between genetic risk and lifestyle on PhenoAgeAccel in both men and women (Appendix 1-table 11).” (Page 13)

      Responses to Reviewer 2:

      Since the UK biobank has a large sample size, it should have enough power to split the dataset into discovery and validation sets. Why did the authors use 10-fold cross-validation instead of splitting the dataset?

      There may have been some misunderstandings in the interpretation of methods that 10-fold cross-validation was applied to select biomarkers when calculating PhenoAge in the previous manuscript (Levine et al., 2018). In this study, we analyzed the association between PhenoAgeAccel and incident cancer risk by dividing participants into ten groups based on the deciles of PhenoAgeAccel and assessed the associations of each group compared to the lowest decile. To avoid any confusion, we have removed the description of 10-fold cross-validation from the Methods section (Page 5).

      Recommendations for the authors:

      In addition, there is extant literature on the role of Phenotypic Age Acceleration in cancer risk and mortality that should be reviewed. Please also address possible overlap with previous work that used the UK Biobank cohort study (PMCID: PMC9958377).

      As suggested, we have reviewed the association of Phenotypic Age Acceleration with cancer risk, and added it into the Discussion section as follows:

      “Recently, several studies have confirmed the associations between PhenoAgeAccel and cancer risk. Mak et al. explored three measures of biological age, including PhenoAge, and assessed their associations with the incidence of overall cancer and five common cancers (breast, prostate, lung, colorectal, and melanoma) (Mak et al., 2023). In our previous study, we investigated the association between PhenoAgeAccel and lung cancer risk and analyzed the joint and interactive effects of PhenoAgeAccel and genetic factors on the risk of lung cancer (Ma et al., 2023). In comparison to these studies, our analysis expanded the range of cancers to 20 types and further explored the associations in different genetic and lifestyle contexts. Moreover, we also evaluated the potential implications of PhenoAge in population-level cancer screening.” (Page 15).

      Other minor comments:

      Line 216, "-4.35 to -1.25" or "-4.35, -1.25" may be better.

      As suggested, we have adjusted text accordingly.

      Line 260, please clarify the PRS used for G by E interaction testing. It could be site-specific PRS or CPRS.

      We used CPRS for G by E interaction testing, and we have changed the description of our methods as follows:

      “We performed additive interaction analysis between genetic risk (defined by CPRS) and PhenoAgeAccel on overall cancer risk, as well as genetic risk (defined by PhenoAgeAccel PRS) and lifestyle on PhenoAgeAccel using two indexes: the relative excess risk due to interaction (RERI) and the attributable proportion due to interaction (AP).” (Page 9)

      Line 223, The discussion/interpretation for "while negatively associated with risk of prostate cancer" is lacking.

      As suggested, we have discussed this as follows:

      “In addition, we observed a negative association between PhenoAgeAccel and prostate cancer risk. The unexpected association may have been confounded by diabetes and altered glucose metabolism, both of which are closely linked to aging. When we removed HbA1c and serum glucose from the biological age algorithms, the association became non-statistically significant. Similar findings were also reported by Mak et al. (Mak et al., 2023) and Dugue et al. (Dugue et al., 2021).” (Page 15).

      It is not clear how to define "biologically older" and "biologically younger". Whether the individuals fall in the "middle area" will impact the results.

      We defined "biologically older" and "biologically younger" based on Phenotypic Age Acceleration (PhenoAgeAccel), which was defined as the residual obtained from a linear model when regressing Phenotypic Age on chronological age. We categorized individuals with PhenoAgeAccel > 0 as biologically older and those with PhenoAgeAccel < 0 as biologically younger.

      Compared with individuals at low accelerated aging (the bottom quintile of PhenoAgeAccel), we found those in the "middle area" (quintiles 2 to 4) and high accelerated aging (the top quintile) had a significantly higher risk of overall cancer (Table 2). Individuals fall in the "middle area" also had a moderate risk of overall cancer, when reclassified accelerated aging levels according to quartiles or tertiles of the PhenoAgeAccel (Appendix 1-table 2).

      Do men and women have distinct biological ages, so they were analyzed separately?

      We found that men (median PhenoAgeAccel: 0.34, IQR: -2.42 to 3.53) have higher biological ages than women (median PhenoAgeAccel: -1.38, IQR: -4.26 to 1.96) (P < 0.0001). In addition, men and women have different cancer incidence patterns (Rubin, 2022). Therefore, we conducted separate analyses to investigate the associations of PhenoAgeAccel with cancer risk in men and women.

      Dai, J., Lv, J., Zhu, M., Wang, Y., Qin, N., Ma, H., . . . Shen, H. (2019). Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir Med, 7(10), 881-891. doi: 10.1016/S2213-2600(19)30144-4

      Dugue, P. A., Bassett, J. K., Wong, E. M., Joo, J. E., Li, S., Yu, C., . . . Milne, R. L. (2021). Biological Aging Measures Based on Blood DNA Methylation and Risk of Cancer: A Prospective Study. JNCI Cancer Spectr, 5(1). doi: 10.1093/jncics/pkaa109

      Kuo, C. L., Pilling, L. C., Liu, Z., Atkins, J. L., & Levine, M. E. (2021). Genetic associations for two biological age measures point to distinct aging phenotypes. Aging Cell, 20(6), e13376. doi: 10.1111/acel.13376

      Levine, M. E., Lu, A. T., Quach, A., Chen, B. H., Assimes, T. L., Bandinelli, S., . . . Horvath, S. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY), 10(4), 573-591. doi: 10.18632/aging.101414

      Ma, Z., Zhu, C., Wang, H., Ji, M., Huang, Y., Wei, X., . . . Shen, H. (2023). Association between biological aging and lung cancer risk: Cohort study and Mendelian randomization analysis. iScience, 26(3), 106018. doi: 10.1016/j.isci.2023.106018

      Mak, J. K. L., McMurran, C. E., Kuja-Halkola, R., Hall, P., Czene, K., Jylhava, J., & Hagg, S. (2023). Clinical biomarker-based biological aging and risk of cancer in the UK Biobank. Br J Cancer, 129(1), 94-103. doi: 10.1038/s41416-023-02288-w

      Rubin, J. B. (2022). The spectrum of sex differences in cancer. Trends Cancer, 8(4), 303-315. doi: 10.1016/j.trecan.2022.01.013

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We wish to thank the Reviewers for their critical analysis of the article and for their suggestions and comments.

      In addition and beside the point-by-point answer to the Reviewers, we wish here to emphasize on three essential points that have been raised: First, we never intended (nor pretended) to address the incidence of the two EHT cell emergence processes on downstream fate, after release from the aortic floor (see for example the last paragraph of our initially submitted manuscript). We only wished to bring evidence on cell biological heterogeneity of the HE, particularly relying on cell polarity control and polarity reestablishment/reinforcement in the case of EHT pol+ cells, thus leading to emergence morphodynamic complexity. In the general context of cell extrusion in which all polarity features are generally downregulated, these are remarkable features.

      Second, we inform the Reviewers that we have performed a major revision of the work on the Pard3 proteins issue the outcome of which, hopefully, substantiates significantly the idea of a tuning of cell polarity features in the HE and all along the EHT time-window, for supporting EHT pol- and EHT pol+ types of emergence. To achieve this, we entirely revised the experimental strategy to increase specificity and sensitivity of detection of Pard3 protein isoforms expressed in the vascular system, based on endothelial FACS-sorting, qRT-PCR and single-molecule whole mount in situ hybridization using RNAscope. Importantly, we wish to stress that, by addressing Pard3 proteins, we initially aimed at substantiating our observations on the localization of our podxl2 construct (del-podxl2) used to label apical membranes. Hence, we sought to bring correlative evidence on the variation of expression of polarity proteins at early and later time points of the EHT time-window (suggesting tightly regulated expression control of polarity determinants, possibly at the mRNA level). This was clearly written and justified in the text, lines 227 or 303 of the initial manuscript. Also, this may have led to identify (a) specific isoform(s), including splicing variants as initially addressed.

      As the Reviewers will see, while performing the revision of our work, we now have been able to point at a specific isoform of Pard3, namely Pard3ba, whose mRNA expression level, in aortic cells and at the single cell resolution, is uniquely and specifically enhanced in cells contacting emergence ‘hot spots’. Using our Runx1 mutant fish line (dt-Runx1), we also show that expression of Pard3ba mRNAs, in these specific aortic regions, is sensitive to interference with Runx1 activity (i.e dt-Runx1 increases Pard3ba expression). Altogether, our new results strongly support our idea, initially proposed, on the regulation of polarity features during EHT; they indicates intercellular coordination, throughout cooperative cross-talk between aortic and HE/EHT cells. This is compatible with the idea of a ‘tuning’ of apico-basal polarity during the entire EHT time-window (including maturation of the HE to become competent for emergence and the emergence process per se whose morphodynamic complexity relies on regulating apico-basal polarity associated functions (ex: for controlling the specific junctional recycling modes of EHT pol+ and EHT pol- cells, as we suggest using JAM proteins that we have chosen owing to their function in the recruitment of Pard3 proteins for apico-basal polarity establishment)). This complements nicely our work and highlights the relevance of studying the interplay between aortic and HE/EHT cells (which we have started to dissect in the second part of our manuscript). Further work is obviously required to address local, dynamic variations of mRNAs encoding for this specific isoform of Pard3 as well as specific interference with its functions at the spatial and temporal levels (hence on live tissues), which is far beyond the scope of our currently submitted work.

      Finally, this emphasizes the importance of the aortic context, at the mesoscopic level, in the regulation of the EHT.

      Third, based on these major points and Reviewers suggestions, we propose to take into account the fact that the heterogeneity in emergence morphodynamics was not highlighted and propose the following title:

      ‘Tuning apicobasal polarity and junctional recycling in the hemogenic endothelium orchestrates the morphodynamic complexity of emerging pre-hematopoietic stem cells’

      Regarding Results and Figures, the previous Figures 3 and 4 have been entirely revised, with the support of Supplement Figures (3 and 4 supplement figures, respectively as well as a supplement video to Figure 3). Supplement Figures have also been included to the revised version, for nearly all results that appeared as data not shown (Figure 1 – figure supplement 2: illustrating the maintenance of EHT pol+ and EHT pol- cells after division; Figure 1 – figure supplement 3: illustrating the expression of the hematopoietic marker CD41 by EHT pol+ and EHT pol- cells). Also, a new supplemental figure, Figure 7 – figure supplement 7, has been added to substantiate the impact of interfering with ArhGEF11/PDZ-RhoGEF alternative splicing on hematopoiesis. Finally, a Figure for the Reviewers is added at the end of this file that shows that virtually 100% of aortic floor cells that we consider as hemogenic cells are positive for the hematopoietic marker Gata2b which is upstream of Runx1 (using RNAscope which allows achieving cellular resolution unambiguously).

      Reviewer #1 (Public Review):

      Summary:

      In this research article, the authors utilized the zebrafish embryo to explore the idea that two different cell types emerge with different morphodynamics from the floor of the dorsal aorta based on their apicobasal polarity establishment. The hypothesis that the apical-luminal polarity of the membrane could be maintained after EHT and confer different functionality to the cell is exciting, however, this could not be established. There is a general lack of data supporting several of the main statements and conclusions. In addition, the manuscript is difficult to follow and needs refinement. We present below some questions and suggestions with the goal of guiding the authors to improve the manuscript and solidify their findings.

      Here, we wish to emphasize that we do not make the hypothesis that ‘…the apical-luminal polarity of the membrane could be maintained after EHT …’ but that the apico-basal polarity establishment/maintenance controls the type of emergence and their associated cell biological features (EHT pol+ and EHT pol- cellular morphodynamics, establishment of membrane domains). Hence, our work suggests that these emergence modes, as a consequence of their intrinsic characteristics and differences, might have an impact on cellular behavior after the release (to place the work in the broader context of hematopoietic cell fate and differentiation). More specifically, the difference in the biological features of the luminal versus abluminal membrane for the two EHT types (ex: membrane signaling territories, membrane pools devoted to specific functions), might endow the cells with specific functional properties, after the release. What happens to those cells thereafter, except for illustrating the evolution of the luminal membrane for pol+ EHT cells, is beyond the scope of this paper. Here, we analyze and characterize some of the cell biological features of the EHT process per se (the emergence from the aortic floor), including the dynamic interface with adjoining endothelial cells.

      Strengths:

      New transgenic zebrafish lines developed. Challenging imaging.

      Weaknesses:

      (1) The authors conclude that the truncated version of Podxl2 fused to a fluorophore is enriched within the apical site of the cell. However, based on the images provided, an alternative interpretation is that the portion of the membrane within the apical side is less stretched than in the luminal side, and therefore the fluorophore is more concentrated and easier to identify by confocal. This alternative interpretation is also supported by data presented later in the paper where the authors demonstrate that the early HE is not polarized (membranes are not under tension and stretched yet). Could the authors confirm their interpretation with a different technique/marker like TEM?

      The argument of the apparent enrichment, or exclusion, of a marker depending on membrane stretching (and hence molecular packing) would be valid for any type of molecule embedded in these membranes, including of course endogenous ones (this is one of the general biophysical principles leading to the establishment of membrane domains, structurally and functionally speaking); hence, using another marker would not solve the issue because it would depends on its behavior in regard to packing (in particular lipid packing), which is difficult to anticipate and is a topic in its own (especially in this system that has been poorly investigated in regard to its biophysical and biochemical properties in vivo (including its exposure to the hemodynamics)).

      If we follow the logic of the Reviewer, it appears that it is not consistent with our results on the maturing HE. Indeed, in our dt-Runx1 mutants, mKate2-podxl2 is enriched at the luminal membrane of HE cells (HE cells are elongated, and the two membrane domains have a relative equal surface and bending); in comparison, HE cells have the same morphology in control animals than in mutants but, in controls, eGFP-podxl2 and mKate2-podxl2 are equally partitioned between the luminal and abluminal membranes (see Figure 3 – figure supplement 2 (for mKate2-podxl2) and Figure 2 – figure supplement 1 and 2 (for eGFP-podxl2)). In addition, we took care while designing the eGFP and mKate2 fusions to keep the natural podxl2 sequence containing critical cysteine residues to maintain assembly properties and distance from the transmembrane segment (hence the fluorescent protein per se is not directly exposed to membrane stretching).

      Finally, electron microscopy is not the approach to use for this issue because requiring tissue fixation which is always at risk because modifying significantly membrane properties. On this line, when we fix embryos (and hence membranes, see our new Figure 4 and its Supplemental Figures), we do not appear to maintain obvious EHT pol+ and pol- cell shapes. In addition, to be conclusive, the work would require not TEM but immuno-EM to be able to visualize the marker(s), which is another challenge with this system.

      (2) Could the authors confirm that the engulfed membranes are vacuoles as they claimed, using, for example, TEM? Why is it concluded that "these vacuoles appear to emanate from the abluminal membrane (facing the sub-aortic space) and not from the lumen?" This is not clear from the data presented.

      The same argument regarding electron microscopy mentioned on the point before is valid here (in addition, it would require serial sectioning in the case it would be technically feasible to make sure not to miss the very tinny connection that may only suggest ultimate narrowing down of the facing adjacent bilayers, which is quite challenging). The term vacuole which we use with caution (in fact, more often, we use the term pseudo-vacuoles in the initial manuscript, lines 140, 146, 1467 (legend to Figure 1 – figure supplemental 1 or apparent vacuole-like in the same legend lines 1465 and 1476) is legitimate here because we cannot say that they are portions of the invaginated luminal membrane as we could be accused not to show that these membranes are still connected to the luminal surface; we are here at the limit of the resolution that in vivo imaging is allowing for the moment with this system, and we drive the attention of the Reviewer on the fact that we are reaching here a sub-cellular level which is already a challenge by itself.

      In addition, if there would not be at some point vacuoles (or pseudo-vacuoles) formed in this system (membrane-bounded organelles), it would be difficult to conceive how, after release of the cell, the fluid inherited from the artic lumen would efficiently be chased from these membranes/organelles (see also our model Figure 1 – figure Supplement 1B).

      Why is it concluded that "these vacuoles appear to emanate from the abluminal membrane (facing the sub-aortic space) and not from the lumen?" This is not clear from the data presented.

      This is not referring to our data but to the Sato et al 2023 work. For EHT undergoing cells leading to aortic clusters in mammals and avians, vacuolar structures indeed appear to emanate from the ab-luminal side facing the sub-aortic space (we cannot call it basal because we do not know the polarity status of these cells). In the Revised version of the manuscript, we have moved this paragraph referring to the Sato et al work to the Discussion, which gives the possibility to expand a bit on this issue, for more clarity (see the second paragraph of our new Discussion).

      (3) It is unclear why the authors conclude that "their dynamics appears to depend on the activity of aquaporins and it is very possible that aquaporins are active in zebrafish too, although rather in EHT cells late in their emergence and/or in post-EHT cells, for water chase and vacuolar regression as proposed in our model (Figure 1 - figure supplement 1B)." In our opinion, these figures do not confirm this statement.

      This part of the text has been upgraded and moved to the Discussion (see our answer to point 2), to take Reviewers concern about clarity of the Results text section and allowing elaborating a bit more on this issue. We only wished to drive the attention on the described presence of intracellular vacuolar structures recently addressed in the Sato el al 2023 paper showing EHTcell vacuoles that are proposed to contribute to cellular deformation during the emergence. We take this example to rationalize the regression of the vacuolar structures described Figure 1 - figure supplement 1B, which is why we have written ‘… it is very possible that aquaporins are active in zebrafish too’; the first part of the sentence refers to the Sato et al 2023 paper.

      (4) Could the authors prove and show data for their conclusions "We observed that both EHT pol+ and EHT pol- cells divide during the emergence"; "both EHT pol+ and EHT pol- cells express reporters driven by the hematopoietic marker CD41 (data not shown), which indicates that they are both endowed with hematopoietic potential"; and "the full recovery of their respective morphodynamic characteristics (not shown)?".

      To the new version of our manuscript, we have added new Supplemental information to Figure 1 (two new Supplemental Figures):

      • Figure 1 - figure Supplement 2 that illustrates that both EHT pol+ and EHT pol- cells divide during the emergence as well as the maintenance of morphology for both EHT cell types. We wish also to add here that the maintenance of the EHT pol+ morphology is the most critical point, showing that dividing cells in this system do not necessarily lead to EHT pol- cells.

      • Figure 1 - figure Supplement 3 that shows that both EHT cell types express CD41.

      (5) The authors do not demonstrate the conclusion traced from Fig. 2B. Is there a fusion of the vacuoles to the apical side in the EHT pol+ cells? Do the cells inheriting less vacuoles result in pol- EHT? It looks like the legend for Fig. 2-fig supp is missing.

      As said previously, showing fusion here is not technically possible, but indeed, this is the idea, which fits with the images corresponding to timing points 0-90 minutes (Figure 2A), showing (in particular for the right cell) a large pseudo-vacuole whose membrane is heavily enriched with the polarity marker podxl2 (based on fluorescence signal in a membrane-bounded organelle that, based on its curvature radius, should be more under tension then the more convoluted EHT pol+ cell luminal membrane). Also, EHT pol – cells may be born from HE cells that either inherit from less intracellular vesicles after division (or that are derived from HE cells that are less – or not - exposed to polarity-dependent signaling (see our data presented in the new Figure 4 and the new version of the Discussion (see paragraphs ‘Characteristics of the HE and complexity of pre-hematopoietic stem cell emergence’ and ‘Spatially restricted control of Pard3ba mRNAs by Runx1’).

      Finally, the cartoon Figure 2B is a hypothetical model, consistent with our data, and that is meant to help the reader to understand the idea extrapolated from images that may not be so easy to interpret for people not working on this system. In legend of Figure 2 that describes this issue in the first version of our manuscript (lines 1241-1243), we were cautious and wrote, in parentheses: ‘note that exocytosis of the large vacuolar structure may have contributed to increase the surface of the apical/luminal membrane (the green asterisk labels the lumen of the EHT pol + cell’.

      The legend to Figure 2 – figure supplement 1 is not missing (see lines 1492 – 1499 of the first manuscript). The images of this supplement are not extracted from a time-lapse sequence and show that as early as 30hpf (shortly after the beginning of the EHT time-window – around 28hpf), cells on the aortic floor already exhibit podxl2-containing pseudo-vacuolar structures (which we propose is a prerequisite for HE cell maturation into EHT competent cells; see also Figure 2 – figure supplement 2).

      (6) The title of the paper "Tuning apico-basal polarity and junctional recycling in the hemogenic endothelium orchestrates pre-hematopoietic stem cell emergence complexity" could be interpreted as functional heterogeneity within the HSCs, which is not demonstrated in this work. A more conservative title denoting that there are two types of EHT from the DA could avoid misinterpretations and be more appropriate.

      There was no ambiguity, throughout our initial manuscript, on what we meant when using the word ‘emergence’; it refers only to the extrusion process from the aortic floor.

      Reducing our title only to the 2 types of EHT cells would be very reductionist in regard to our work that also addresses essential aspects of the interplay between hemogenic cells, cells undergoing extrusion (EHT pol+ and pol- cells), and their endothelial neighbors (not to mention what we show in terms of the cell biology for the maturing HE and the regulation of its interface with endothelial cells (evidence for vesicular trafficking, specific regulation of HE-endothelial cell intercalation required for EHT progression etc … ). However, and to take this specific comment into account, we propose a slightly changed title saying that there are emergences differentially characterized by their morphodynamic characteristics:

      ‘Tuning apicobasal polarity and junctional recycling in the hemogenic endothelium orchestrates the morphodynamic complexity of emerging pre-hematopoietic stem cells’

      (7) There are several conclusions not supported by data: "Finally, we have estimated that the ratio between EHT pol+ and EHT pol- cells is of approximately 2/1". "We observed that both EHT pol+ and EHT pol- cells divide during the emergence and remain with their respective morphological characteristics". "We also observed that both EHT pol+ and EHT pol- cells express reporters driven by the hematopoietic marker CD41 (data not shown), which indicates that they are both endowed with hematopoietic potential." These conclusions are key in the paper, and therefore they should be supported by data.

      Most of the requests of the Reviewer in this point have already been asked in point 4 and were added to the revised version.

      Regarding the EHT pol+/pol- ratio, we will keep the ratio to approximately 2/1. The Reviewer should be aware that quantification of EHT cells is a tricky issue and a source of important variability, as can be assessed by the quantifications that we have been performing (see for example figures in which we compare the dt-Runx1 phenotype with Ctrl). This is inherent to this system, more specifically because the EHT process is asynchronous, ranging from approx. 28 hpf to 3 days post fertilization (we have even observed EHT at 5 dpf). We systematically observed heterogeneity in EHT numbers and EHT types between animals and also between experiments (some days we observe EHTs at 48 hpf, others more around 55 hpf or even later). In addition, emergence also proceeds on the lateral side of the aorta and, while it is relatively easy to identify EHT pol+ cells because of their highly characterized morphology, it is more difficult for EHT pol- cells that can be mistaken to round HE cells preparing for division. In the current revision of our work, we provide additional facts and potential explanations on the mechanisms that control this asynchrony and the apparent stochasticity of the EHT process (see results of new Figures 3 and 4).

      Reviewer #2 (Public Review):

      In this study, Torcq and colleagues make careful observations of the cellular morphology of haemogenic endothelium undergoing endothelial to haematopoietic transition (EHT) to become stem cells, using the zebrafish model. To achieve this, they used an extensive array of transgenic lines driving fluorescent markers, markers of apico-basal polarity (podocalixin-FP fusions), or tight junction markers (jamb-FP fusions). The use of the runx truncation to block native Runx1 only in endothelial cells is an elegant tool to achieve something akin to tissuespecific deletion of Runx1. Overall, the imaging data is of excellent quality. They demonstrate that differences in apico-basal polarity are strongly associated with different cellular morphologies of cells undergoing EHT from HE (EHT pol- and EHT pol+) which raises the exciting possibility that these morphological differences reflect the heterogeneity of HE (and therefore HSCs) at a very early stage. They then overexpress a truncated form of Runx1 (just the runt domain) to block Runx1 function and show that more HE cells abort EHT and remain associated with the embryonic dorsal aorta. They identify pard3aa and pard3ab as potential regulators of cell polarity. However, despite showing that loss of runx1 function leads to (late) decreases in the expression of these genes, no evidence for their role in EHT is presented. The FRAP experiments and the 2d-cartography, albeit very elegant, are difficult to interpret and not very clearly described throughout the text, making interpretation difficult for someone less familiar with the techniques. Finally, while it is clear that ArhGEF11 is playing an important role in defining cell shapes and junctions between cells during EHT, there is very little statistical evidence to support the limited data presented in the (very beautiful) images.

      As mentioned in the response to reviewer 1, we revised our whole strategy for the analysis of the role of Pard3 proteins in regulating the emergence of hematopoietic precursors. Our new data, obtained using refined gene expression analysis by qRT-PCR on FACS sorted populations and by in situ gene expression analysis at the single-cell resolution using RNAscope, show first that a unique Pard3 isoform (Pard3ba) is sensitive to runx1 activity, and that its expression is specifically localized in aortic cells contacting hemogenic(HE)/EHT cells. We show a clear correlation between the densification of Pard3ba mRNAs and the presence of contacting HE/EHT cells, suggesting a key role for Pard3ba in a cross talk between aortic and hemogenic cells. Furthermore, we show that our dt-runx1 mutant impacts on the maturation of HE cells; when this mutant is expressed, we observe, in comparison to control, an accumulation of HE cells that are abnormally polarized as well as unusually high numbers of EHT pol+ cells. This strongly suggests that the polarity status of HE cells controls the mode of emergence. Overall, our work shows that regulation of apico-basal polarity features is essential for the maturation of the HE and the proper proceeding of the EHT.

      We made efforts to explain more clearly the FRAP experiments as well as the analysis of 2Dcartography throughout the text to facilitate readers comprehension. 2D-cartography are an invaluable tool to precisely discriminate between endothelial and hemogenic cells, and their usage was essential during the FRAP sessions, to point at specific junctional complexes accurately. Performing FRAP at cellular junctions during aortic development was extremely challenging technically and the outcome subjected to quite significant variability (which often leads to quantitative results at the limit of the statistical significance, which is why we speak of tendencies in our results section reporting on this type of experiments). Apart from constant movement and drifting of the embryos which are sources of variability, the EHT process per se is evolving over time and does so at heterogeneous pace (for example, the apical closure of EHT pol+ cells is characterized by a succession of contraction and stabilization phases, see Lancino et al. 2018) which is an additional source of variability in the measurements. Despite all this, our data collectively and consistently suggest a differential regime of junctional dynamics between EHT cell types and support the critical function of ArhGEF11/PDZ-RhoGEF in the control of junctional turnover at the interface between HE and aortic cells as well as between HE cells to regulate cell-cell intercalation.

      There is a sense that this work is both overwhelming in terms of the sheer amount of imaging data, and the work behind it to generate all the lines they required, and at the same time that there is very little evidence supporting the assertion that pard3 (and even ArhGEF11) are important mediators of cell morphology and cell fate in the context of EHT. For instance, the pard3 expression data, and levels after blocking runx1 (part of Figure 3 and Figure 4) don't particularly add to the manuscript beyond indicating that the pard3 genes are regulated by Runx1.

      We thank the reviewer for the comment on the Pard3 data particularly because it led us to reconsider our strategy to address with more precision and at the cellular resolution the potential function of this protein family during the time-window of the EHT. As summarized in the header of the Public Review, we identified one specific isoform of Pard3 in the zebrafish - Pard3ba – whose sensitivity to runx1 interference and spatial restriction in expression reinforce the idea of a fine control of apico-basal polarity features and associated functions while EHT is proceeding. Our new data also reinforce the interplay between HE/EHT cells and their direct endothelial neighbors.

      Weaknesses

      The writing style is quite convoluted and could be simplified for clarity. For example, there is plenty of discussion and speculation throughout the presentation of the results. A clearer separation of the results from this speculation/discussion would help with understanding. Figures are frequently presented out of order in the text; modifying the figures to accommodate the flow of the text (or the other way around) - would make it much easier to follow the narrative. While the evidence for the different cellular morphologies of cells undergoing EHT is strong, the main claim (or at least the title of the manuscript) that tuning apico-basal polarity and junctional recycling orchestrate stem cell emergence complexity is not well supported by the data.

      We refined our text when necessary, in particular taking care of transferring and substantiating the arguments that appeared in the Results section, to the Discussion. We also made efforts, on several occasions and for clarity, to describe more precisely the results presented in the different panels of the Figures.

      As mentioned in the header of the text of the Public Review and the response to the 6th point of the Public Review of Reviewer 1, we modified slightly the title to avoid ambiguity. In addition, we added a new paragraph to the beginning of our discussion that summarizes the impact of our findings and, we believe, legitimates our title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Embryonic stages should be indicated in all images presented for clarification.

      We thank the reviewer for this point, we added stages when missing on the figures (Figure 1, Figure 1 - Figure supplement 1, Figure 2, Figure 2 - Figure supplement 1, Figure 5, Figure 6, Figure 6 - Figure supplement 1, Figure 7 - Figure supplement 3, Figure 7 - Figure supplement 5, Figure 7 - Figure supplement 6)

      (2) In which anatomical site/s were images from Fig 1C and D taken? The surrounding environment looks different, for example, cells in Fig1D seem to be surrounded by other cells, resembling the endothelial plexus at the CHT, while the cells in Fig. 1C seem to be in the dorsal aorta. Is there a spatial difference depending on where cells are budding off? The authors state that there are no differences, but no quantification or data demonstrating that statement is provided.

      As mentioned in the figure legend (lines 1206-1209 of the original manuscript), images for Figure 1C and 1D were both taken at the boundary between the end of the AGM and the entry in the caudal hematopoietic tissue. As the images were acquired from different embryos, the labelling of the underlying vein differs between the two panels, with veinous tissues being more sparsely labelled in panel C than in panel D. These images were chosen to illustrate the clearly opposite morphology between the two EHT types that we describe. However, for the rest of the paper, all images and all analysis were exclusively acquired / performed in the dorsal aorta in the AGM, in a region spanning over approximately 10-12 inter-segmentary vessels, starting from the end of the elongated yolk up to the start of the balled yolk. In light of the work from the lab of Zilong Wen showing that only cells emerging anteriorly exhibit long-term replenishment potential (Tian et al. 2017), we specifically chose to limit our comparative analysis to the AGM region and did not quantitatively investigate emergences occurring in the caudal region of the aorta. Additionally, although we routinely observe both types of emergences occurring in the caudal region of the dorsal aorta, we did not quantify the frequency of either EHT events in this region.

      Finally, the EHT pol+ cells that we show Figure 1C are of the highest quality obtained ever; one reason is that these two cells emerge at the entry of the CHT which is a region a lot easier to image at high resolution in comparison to the trunk because the sample is less thick and because we are less perturbed by heart beats.

      (3) Which figure shows "EHT pol- cells were observed in all other Tg fish lines that we are routinely imaging, including the Tg(Kdrl:Gal4;UAS:RFP) parental line that was used for transgenesis, thus excluding the possibility that these cells result from an artefact due to the expression of a deleted form of Podxl2 and/or to its overexpression."? It would be informative to include this figure.

      Other examples of EHT pol- cells were shown Figure 5C as well as Figure 6B using the Tg(kdrl:Jam3b-eGFP; kdrl:nls-mKate2) fish line, that was routinely used for junctional dynamic analyses by FRAP. Furthermore, we add now a new figure (New Figure 1 – figure supplement 3), to illustrate the presence of EHT pol- cells using the Tg(CD41:eGFP) transgenic background, additionally illustrating that EHT pol- cells are CD41 positive.

      (4) Are the spinning disk confocal images a single plane? Or maximum projections? Sometimes this is not specified.

      We made sure to take into account this remark and went through all figures legends to specify the type of images presented (Figure 1 – figure supplement 1, Figure 2, Figure 2 – figure supplement 1, Figure 2 – figure supplement 2, Figure 7 – figure supplement 3) and also, when relevant, we added this information directly to the figure panels (Figure 6A – 6B).

      (5) Could the expression data by RT-qPCR for the Pard3 isoforms be shown? Additionally, it would be appreciated if this expression data could be complemented using Daniocell (https://daniocell.nichd.nih.gov/).

      As mentioned in the first paragraph of our response to Public Reviews, and based on reviewers’ comments, we revised our strategy for the investigation of pard3 proteins expression in the vascular system, for their potential role in EHT and sensitivity to runx1. First, we used FACS sorting as well as tissue dissection to enrich in aortic endothelial cells and perform our qPCR analyses (see the new Figure 4 – figure supplement 1A and Figure 4 – figure supplement 3A for the strategy). As asked by the reviewers and for more transparency, we show the expression relative to the housekeeping gene ef1a in our different control samples (new Figure 4 – figure supplement 1C). Furthermore, we used single-molecule FISH to precisely characterise in situ the expression of several of the Pard3 isoforms (Pard3aa, Pard3ab and Pard3ba, which, based on qPCR, were the most relevant for our investigation in the vascular system) (see lines 386 to 412 in text relative to Figure 4 – figure supplement 2). This new addition nicely shows the different pattern of expression of 3 of the Pard3 zebrafish isoforms in the trunk of 2dpf embryos, outlining interesting specificities of each isoform expression in different tissues.

      We thank the reviewer for this suggestion to complement our data with the published Daniocell dataset. However, and potentially due to the poor annotation of the different pard3 genes on public databases, gene expression information was absent for two of our isoforms of interest (pard3aa and pard3ba), that we ultimately show to be the most enriched in the vascular system in the trunk. Daniocell gene expression data for the Pard3ab isoform at 48hpf show expression in pronephric duct at 48-58hpf, as well as in intestine progenitors and neuronal progenitors, which is consistent with our in situ observations using RNAscope. However, pard3ab is poorly detected within the hematopoietic and vascular clusters. This observation is coherent with our data that do not show any enrichment of this isoform in vascular tissues compared to other structures. On the other hand, pard3bb does not seem to be particularly enriched in vascular/hematopoietic clusters at 48-58hpf in the Daniocell dataset, in accordance to what we observe with our qPCR. Finally, in the Daniocell dataset, all of the pard3 variants (pard3ab, pard3bb, PARD3 and PARD3 (1 of many)) seem to be either scarcely or not detected in the hematopoietic/vascular system. In our case, for all the isoforms we studied in control condition (pard3aa, pard3ab and pard3ba), and although the technic is only semi-quantitative due to the presence of an amplification step, RNAscope assays seem to indicate a very low expression in aortic cell (with sometime as little as one mRNA copy per cell; this explains low detection in single-cell RNAseq datasets and is coherent with the Daniocell dataset.

      (6) It would be informative to add in the introduction some information on apico-basal polarity, tight junctions, JAMs (ArhGEF11/PDZ-RhoGEF).

      We modified the introduction so as to add relevant information on Pard3 proteins, their link with our JAMs reporters in the context of polarity establishment, as well as the role of ArhGEF11/PDZ-RhoGEF and its alternative splicing variants in regulating junctional integrity in the context of epithelial-to-mesenchymal transition (lines 99 to 127). This modification of the introduction also allowed us to lighten some parts of the result section (lines 222 to 224, 345 to 349 and 454 to 456 of the original manuscript).

      Reviewer #2 (Recommendations For The Authors):

      (1) There is lots of data (and lots of work) in this paper; I feel that the pard3 data doesn't substantially add to the paper, and at the same time there is data missing (see point 10, point 11 below for an example).

      To add to the clarity and substantiate our findings on Pard3, we revised entirely our investigation strategy as mentioned in previous paragraphs. We refined the characterization of Pard3 isoforms expression in the vascular tissue, using both cell enrichment by FACS for gene expression analysis as well as single-molecule FISH (RNAscope) to access to spatial information on the expression of pard3 isoforms, reaching sub-cellular resolution.

      This new strategy allowed us to show the unexpected localization of Pard3ba mRNAs in mRNAs enriched regions in the vicinity of HE/EHT cells (new Figure 4, and paragraph Interfering with Runx1 activity unravels its function in the control of Pard3ba expression and highlights heterogeneous spatial distribution of Pard3ba mRNAs along the aortic axis, see the new manuscript). Overall, the new spatial analysis we performed allowed us to substantiate our findings on Pard3ba and suggests a direct interplay between hemogenic cells and their endothelial aortic neighbors; this interplay supposedly relies on apico-basal polarity features that is at least in part regulated by runx1 in the context of HE maturation and EHT.

      (2) Labelling of the figures could be substantially improved. In many instances, the text refers to a figure (e.g. Fig 6A), but it has several panels that are not well annotated (in the case of Fig 6A, four panels) or labelled sparsely in a way that makes it easy to follow the text and identify the correct panel in the figure. Even supplementary figures are sparsely labelled. Labelling to include embryonic stages, which transgenic is being used, etc should be added to the panels to improve clarity for the reader.

      We revised the figures to added relevant information, including stages, types of images and annotations to facilitate the comprehension, including Figure 6A – 6B, Figure 5B – 5C (see response to Reviewer 1, first comment, for a more complete list of all revised figures, transgenic fish lines and embryonic stages annotations). Furthermore, we revised the integrality of the manuscript to fit as much as possible to the figures and added some annotations to more easily link the text to the figures and panels.

      (3) The current numbering of supplementary figures is quite confusing to follow.

      We revised the manuscript so as to make sure all principal and supplementary figures were called in the right order and that supplementary figures appearance was coherent with the unfolding of the text. For Figure 7 only, the majority of the supplemental figures are called before the principal figure, as they relate to our experimental strategy that we comment on before describing the results.

      (4) Graphs in Fig 4, Fig 7 supplement 1 and some of the supplementary figures miss statistical info for some comparison (I assume when non-significant), and sometimes present a p-value of a statistical test being done between samples across stages - but these are not dealt with in the text. Throughout all graphs, the font size used in graphs for annotation (labelling of samples, x-axis, and in some cases the p values) is very small and difficult to read.

      For Figure 7 - figure supplement 1, non-significant p-values of statistical tests were not displayed (as mentioned in the Figure legend, line 1614 of the original manuscript). For the new Figure 4, all p-values are displayed. For new Figure 4 - figure Supplement 1, statistical tests were only performed to compare RFP+ and RFP- cells in the trunk condition (3 biological replicates) and not in the whole embryo condition, for which we did not perform enough replicates for statistical analysis (biological duplicates).

      (5) The results are generally very difficult to follow, with a fair amount of discussion included but then very little detail of the experiments per se.

      We thank the reviewers for these comments that helped us improve the clarity of the manuscript.

      The Results section was revised to move some of the paragraphs to the introduction (see response to Reviewer 1, 6th comment), and some of them to the Discussion (such as lines 149 to 156 or 410 to 416 in the first version of the manuscript referring to vacuolar structures or to the recycling modes of JAMs in EHT pol+ and EHT pol- cells).

      (6) The truncated version of runx1 is introduced but its expected effect is not explained until the discussion. Related to this, is it expected that blocking runx1 with this construct (leading to accumulation of cells in the aorta before they undergo EHT) then leads to increased numbers of T-cell progenitors in the thymus? Abe et al (2005, J Immunol) have used the same strategy to overexpress the runt domain in thymocytes and found a decrease in these cells, rather than an increase. Can you explain this apparent discrepancy?

      We thank the reviewer for this interesting point on the effect of runx1 interference. This phenotype (increased number of thymic cells) seems to be in agreement with the phenotype that was described in zebrafish using homozygous runx1 mutants (Sood et al. 2010 PMID: 20154212), in which the authors show an increase of lymphoid progenitors in the kidney marrow of adult runx1W84X/W84X mutants compared to controls as well as a similar number of intra-thymic lck:eGFP cells in mutants and controls. Notably, the T-lymphoid lineage seems to be the only lineage spared by the mutation of runx1. This could suggest that in this case either the T-lymphoid lineage can develop independently of runx1 or that a compensation phenomenon (for example by another protein of the runx family) occurs to rescue the generation of T-lymphocytes.

      Although our data shows an impact on T-lymphopoiesis, we do not elucidate the exact mechanism leading to an increased number of thymic cells. In our case, we do not know the half-life of our dt-runx1 protein in newly generated hematopoietic cells when our transgene, expressed under the control of the kdrl vascular promoter, ceases to be produced after emergence. The effect we observe could be direct, due to the presence of our mutant protein after 3 days in thymic cells, or indirect, due to the impact of our mutant on the HE, that could lead to the preferential generation of lymphoid-biased progenitors. Similarly, we do not know whether the cells we observe at this stage in the thymus are generated from long-term HSC or short-term progenitors. Indeed, cell tracing analysis from the lab of Zilong Wen (Tian et al. 2017, see our Ref list) show the simultaneous presence of short-term PBI derived and longterm AGM derived thymic cells at 5dpf. Based on this, we can imagine for example that the sur-numerous cells we observe in the thymus are transient populations that could multiply faster in the absence of definitive populations. Conversely, based on our observation of an accumulation of EHT pol+ events, we can imagine that the EHT pol+ and EHT pol- cells are indeed differentially fated and that EHT pol+ may be biased toward a lymphoid lineage. We also know that at the stage we observe (5dpf), RNAscope assay of runx1 show that a vast majority of thymic cells do not express runx1 (our preliminary data), suggesting that the effect we observe would be an indirect one caused by upstream events rather than by direct interference with the endogenous expression of runx1 in thymic cells.

      The article referred to by the reviewer (Sato et al. 2005, PMID: 16177090) investigates on the role of runx1 during TCR selection for thymic cell maturation and shows that runx1 signaling lowers the apoptotic sensitivity of double-positive thymocytes when artificially activated, leading to a reduced number of single-positive thymic cells. Furthermore, this paper references another study from the same lab (Hayashi et al. 2000, PMID: 11120804) that used the same strategy to study the role of runx1 on the positive and negative selection steps of T lymphocytes maturation. This paper, although showing that runx1 is important for later stages of T lymphocytes differentiation — the double-positive to single-positive stage maturation —, also shows a relative increase in the amount of double-negative and double-positive thymocytes, that could be coherent with our observations. Indeed, in our case, although we show an increased number of thymic cells, we do not know the relative proportion of the different thymocyte subsets. We could explain the increased number of thymic cells by increased number of DN/DP thymocytes that would not preclude a decrease in single-positive thymocytes. Finally, the cells we observe in the thymus of our dt-runx1 mutants may also be different lymphoid populations, namely ILCs, that would react differently to runx1 interference.

      (7) Lines 154-155 refer to aquaporins but are missing a reference. This is a bit of speculation right in the results section and I struggled to understand what the point of it was.

      To clarify the argument and ease the flow of the text, as suggested by the reviewers, we transferred this paragraph (lines 149 to 156 of the initial manuscript) to the Discussion section lines 763-789). We additionally made sure to add the missing reference (Sato et al. 2023, see our Ref list).

      (8) Lines 173-175, indicating that both EHTpol+ and pol- express the CD41 transgenic marker - would be useful to show this data.

      We provide a new supplement Figure (Figure 1 – figure supplement 3), where, using an outcross of the CD41:eGFP and kdrl:mKate2-podxl2 transgenic lines, we show unambiguously and for multiple cells that both polarized EHT pol+ cells and non-polarized EHT pol- cells are CD41 positive. In addition, but not commented on in the main text, we can also see that an HE cell, characterized by its elongated morphology (in the middle of the field), its thickened nucleus and its position on the aortic floor, is also CD41 positive.

      (9) Lines 181-201 - it's not clear how HE cells were identified in the first place - was it just morphology? Or were they identified retrospectively?

      HE cells were identified solely on morphology and spatial criteria (as mentioned in the Methods section, lines 1073-1082 and 1108-1111 of the first manuscript). Furthermore, a recent investigation by the lab of Zilong Wen (Zhao et al. 2022, see our Ref list) questioning the common origin of HE cells and of endothelial cells as well as their respective capacity to extrude from the aorta to generate hematopoietic cells showed, by single-cell tracing, that 96% of floor cells are indeed hemogenic endothelial cells. Furthermore, as mentioned in the response to the 8th point, we show in Figure 1 – figure supplement 3 that all floor cells express CD41. Finally, we also used an alternative method to validate the true hemogenic identity of aortic floor cells and show, using RNAscope, that virtually 100% of floor cells that we consider as typical HE cells are indeed expressing an hematopoietic transcription factor upstream of Runx1, namely Gata2b (see Author response image 1).

      Author response image 1.

      All cells from the aortic floor, at 48hpf, express the hematopoietic marker Gata2b. 48 hpf Tg(Kdrl:eGFP) fixed embryos were used for RNAscope using a probe designed to detect Gata2b mRNAs. Subsequently, images were taken using spinning disk confocal microscopy. The image in the top panel is a z-projection of the entire aortic volume of one embryo and shows the full portion of the dorsal aorta from the anterior part (left side, at the limit of the balled yolk) down to the urogenital orifice (UGO, right side). The 4 boxes (1 - 4) delineate regions that have been magnified beneath (2X). The 2X images corresponding to each box are z-projections (top views) or z-sections (bottom views). The bottom views allow to visualize the aortic floor and to mark its position on top views). Pink arrows point at HE cells (elongated in the anteroposterior direction) and at EHT cells (ovoid/round cells; EHT pol+ cell morphology is not preserved after fixation and RNAscope; thus, it cannot be distinguished from ovoid/round EHT pol- cells). Pink dots = RNAscope spots of various sizes. The green cells in the subaortic space that are marked by RNAscope spots are newly born hematopoietic stem and progenitor cells (see for example box 1). This embryo is representative of n = 5 embryos treated and imaged.

      (1) Line 276 - the difference between the egfp-podxl2 and mKate-podxl2 - could that be due to the fluorophore used? Also, it would be good to label Fig 3 supplement 2 better and to see a control alongside the runt overexpression.

      Line 276 does not point at a difference in control conditions between eGFP-podxl2 and mKatepodxl2 (see in new Figure 1 – figure supplement 3, Figure 2 or in new Figure 3 - figure supplement 2 several examples of non-polarized HE cells in control conditions using both fluorophores) but between control and dt-runx1 conditions, both expressing the mKate2podxl2 transgene. Similarly, the new example that we provide now in the CD41 figure (Figure 1 – figure supplement 3) clearly shows that mKate-podxl2 is enriched at the apical/luminal membrane of EHT pol+ cells while no such enrichment is observed for EHT pol- cells. The Reviewer should be informed that EHT cells are not always the most typical in shape, in particular because cells can be squeezed by underlying tissues and for example the vein; or from the luminal side by flow and tensions on the aortic wall because of heart beat (the more we image up in the trunk, the more difficult the imaging and the stability of cell shape during long time-lapse sequences). To also take into account the reviewer’s comments, we added for the new Figure 3 – figure supplement 2A a control condition next to the dt-runx1 condition.

      (2) There is no quantitation data on the number of excess EHT pol+ cells in the DA, or in the thymus data (Figs 3 Supp1 and Fig 3 Supp 3). Can you quantify this data? This would better support the claim that tunin apico-basal polarity alters the morphology of the emerging HE cells.

      We added quantifications relative to both the emergence process itself, showing the accumulation of HE and EHT pol+ cells (new Figure 3B), and on hematopoiesis per se (new Figure 3 – figure supplement 1). Indeed, we show a diminution in the number of newly generated cmyb+ cells in the sub-aortic space. Furthermore, we improved our quantification of the later phenotype on the thymus (new Figure 3 – figure supplement 3), using improved segmentation methods, that indeed validate the increase number of thymic cells that we described.

      (3) The observed changes in pard3 isoforms are just reading out changes in their expression in the runt1 transgenics, rather than demonstrating a role in apico-basal polarity.

      We entirely revised our strategy regarding Pard3 expression analyses (see also the text at the beginning of this file, for the Public Review). But we wish to stress on the point that we did not intend initially to show directly a role of Pard3 proteins in controlling apico-basal polarity in the system, we just intended to provide correlative evidence supporting our observations with the polarity marker podxl2 (by interfering with their function, as written in the text, apico-basal polarity - which is essential for aortic lumenization and maintenance -, would have been impaired, blurring interpretations).

      During the revision, we obtained the unexpected finding, using RNAscope, that one Pard3 isoform, namely Pard3ba, is the one Pard3 that is expressed non-homogenously along the aortic axis and, in vast majority, by aortic cells and in the direct vicinity of emergence domains of the aortic floor (see the new Figure 4 and Figure 4 – figure supplements 2, 3).

      This correlative relation between expression of Pard3ba in aortic endothelial cells neighbouring HE/EHT cells suggests, as we propose, that a cross talk occurs between hemogenic and aortic cells, and that this cross talk relies, at least in part, on the expression of key components of apico-basal polarity and their associated functional features. In addition, we show that junctional recycling differs between both EHT types, based on our observations on the different dynamics in the turnover of JAM molecules, in the two EHT types. As JAM molecules are also required for the recruitment of Pard3, which initiates the establishment of apico-basal polarity, these different dynamics suggest that the control of apico-basal polarity is involved in supporting the morphodynamic complexity of EHT cell types.

      (4) There is a Fig 5, Supp 2 that is neither mentioned nor described anywhere in the manuscript.

      Figure 5 - figure Supplement 2 is mentioned lines 366-370 of the original manuscript, to describe the initial validation that was performed for our eGFP-JAM constructs in multiple cell types using an ubiquitous heat-shock promoter. We developed our description of this supplemental figure in the new manuscript (lines 504 to 514).

      (5) Lines 445-456 - these read like a bit of discussion, not results. There are other similar parts of the results section that also read like a discussion (e.g. 526-533)

      Although we decided to keep this paragraph in the Results section, as it justifies the rationale behind the choice of ArhGEF11/PDZ-RhoGEF, we took the reviewers comment into account and, as mentioned in the response to reviewer 1 6th comment, lightened the Results section by transferring some of the paragraphs to the Introduction or Discussion sections.

      (6) The description of Fig 7A (from line 505) is missing the stages at which the experiments were performed (also not labelled on the figure).

      The stages at which the experiments were performed is stated in the figure legend (line 1366) as well as in the Methods section of the original manuscript (line 1033). We added the information on top of the panels A and B for more clarity.

      (7) Some figures have multiple panels (e.g. Fig 7Aa'), so when referred to in the text, it remains unclear which panel is being referred to.

      We modified the text so as to refer more clearly to the different panels when mentioned in the text, particularly with regards to Figure 7 and 8 but also for all the other figures.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents valuable data on the antigenic properties of neuraminidase proteins of human A/H3N2 influenza viruses sampled between 2009 and 2017. The antigenic properties are found to be generally concordant with genetic groups. Additional analysis have strengthened the revised manuscript, and the evidence supporting the claims is solid.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors investigated the antigenic diversity of recent (2009-2017) A/H3N2 influenza neuraminidases (NAs), the second major antigenic protein after haemagglutinin. They used 27 viruses and 43 ferret sera and performed NA inhibition. This work was supported by a subset of mouse sera. Clustering analysis determined 4 antigenic clusters, mostly in concordance with the genetic groupings. Association analysis was used to estimate important amino acid positions, which were shown to be more likely close to the catalytic site. Antigenic distances were calculated and a random forest model used to determine potential important sites.

      This revision has addressed many of my concerns of inconsistencies in the methods, results and presentation. There are still some remaining weaknesses in the computational work.

      Strengths

      (1) The data cover recent NA evolution and a substantial number (43) of ferret (and mouse) sera were generated and titrated against 27 viruses. This is laborious experimental work and is the largest publicly available neuraminidase inhibition dataset that I am aware of. As such, it will prove a useful resource for the influenza community.

      (2) A variety of computational methods were used to analyse the data, which give a rounded picture of the antigenic and genetic relationships and link between sequence, structure and phenotype.

      (3) Issues raised in the previous review have been thoroughly addressed.

      Weaknesses

      (1). Some inconsistencies and missing data in experimental methods Two ferret sera were boosted with H1N2, while recombinant NA protein for the others. This, and the underlying reason, are clearly explained in the manuscript. The authors note that boosting with live virus did not increase titres. Additionally, one homologous serum (A/Kansas/14/2017) was not generated, although this would not necessarily have impacted the results.

      We agree with the reviewer and this point was addressed in the previous rebuttal.

      (2) Inconsistency in experimental results

      Clustering of the NA inhibition results identifies three viruses which do not cluster with their phylogenetic group. Again this is clearly pointed out in the paper and is consistent with the two replicate ferret sera. Additionally, A/Kansas/14/2017 is in a different cluster based on the antigenic cartography vs the clustering of the titres

      We agree with the reviewer and this point was addressed in the previous rebuttal.

      (3) Antigenic cartography plot would benefit from documentation of the parameters and supporting analyses

      a. The number of optimisations used

      We used 500 optimizations. This information is now included in the Methods section.

      b. The final stress and the difference between the stress of the lowest few (e.g. 5) optimisations, or alternatively a graph of the stress of all the optimisations. Information on the stress per titre and per point, and whether any of these were outliers

      The stress was obtained from 1, 5, 500, or even 5000 optimizations (resulting in stress values of respectively, 1366.47, 1366.47, 2908.60, and 3031.41). Besides limited variation or non-conversion of the stress values after optimization, the obtained maps were consistent in multiple runs. The map was obtained keeping the best optimization (stress value 1366.47, selected using the keepBestOptimization() function).

      Author response image 1.

      The stress per point is presented in the heat map below.

      The heat map indicates stress per serum (x-axis) and strain (y-axis) in blue to red scale.

      c. A measure of uncertainty in position (e.g. from bootstrapping)

      Bootstrap was performed using 1000 repeats and 100 optimizations per repeat. The uncertainty is represented in the blob plot below.

      Author response image 2.

      (4) Random forest

      The full dataset was used for the random forest model, including tuning the hyperparameters. It is more robust to have a training and test set to be able to evaluate overfitting (there are 25 features to classify 43 sera).

      Explicit cross validation is not necessary for random forests as the out of bag process with multiple trees implicitly covers cross validation. In the random forest function in R this is done by setting the mtry argument (number of variables randomly sampled as candidates at each split). R samples variables with replacement (the same variable can be sampled multiple times) of the candidates from the training set. RF will then automatically take the data that is not selected as candidates as test set. Overfit may happen when all data is used for training but the RF method implicitly does use a test set and does not use all data for training.

      Code:

      rf <- randomForest(X,y=Y,ntree=1500,mtry=25,keep.forest=TRUE,importance=TRUE)

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the antigenicity of N2 protein of 43 selected A(H3N2) influenza A viruses isolated from 2009-2017 using ferret and mice immune sera. Four antigenic groups were identified, which the authors claimed to be correlated with their respective phylogenic/ genetic groups. Among 102 amino acids differed by the 44 selected N2 proteins, the authors identified residues that differentiate the antigenicity of the four groups and constructed a machine-learning model that provides antigenic distance estimation. Three recent A(H3N2) vaccine strains were tested in the model but there was no experimental data to confirm the model prediction results.

      Strengths:

      This study used N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 and generated corresponding panels of ferret and mouse sera to react with the selected strains. The amount of experimental data for N2 antigenicity characterization is large enough for model building.

      Weaknesses:

      The main weakness is that the strategy of selecting 43 A(H3N2) viruses from 2009-2017 was not explained. It is not clear if they represent the overall genetic diversity of human A(H3N2) viruses circulating during this time. In response to the reviewer's comment, the authors have provided a N2 phylogenetic tree using180 randomly selected N2 sequences from human A(H3N2) viruses from 2009-2017. While the 43 strains seems to scatter across the N2 tree, the four antigenic groups described by the author did not correlated with their respective phylogenic/ genetic groups as shown in Fig. 2. The authors should show the N2 phylogenic tree together with Fig. 2 and discuss the discrepancy observed.

      The discrepancies between the provided N2 phylogenetic tree using 180 selected N2 sequences was primarily due to visualization. In the tree presented in Figure 2 the phylogeny was ordered according to branch length in a decreasing way. Further, the tree represented in the rebuttal was built with PhyML 3.0 using JTT substitution model, while the tree in figure 2 was build in CLC Workbench 21.0.5 using Bishop-Friday substitution model. The tree below was built using the same methodology as Figure 2, including branch size ordering. No discrepancies are observed.

      Phylogenetic tree representing relatedness of N2 head domain. N2 NA sequences were ordered according to the branch length and phylogenetic clusters are colored as follows: G1: orange, G2: green, G3: blue, and G4: purple. NA sequences that were retained in the breadth panel are named according to the corresponding H3N2 influenza viruses. The other NA sequences are coded.

      Author response image 3.

      The second weakness is the use of double-immune ferret sera (post-infection plus immunization with recombinant NA protein) or mouse sera (immunized twice with recombinant NA protein) to characterize the antigenicity of the selected A(H3N2) viruses. Conventionally, NA antigenicity is characterized using ferret sera after a single infection. Repeated influenza exposure in ferrets has been shown to enhance antibody binding affinity and may affect the cross-reactivity to heterologous strains (PMID: 29672713). The increased cross-reactivity is supported by the NAI titers shown in Table S3, as many of the double immune ferret sera showed the highest reactivity not against its own homologous virus but to heterologous strains. In response to the reviewer's comment, the authors agreed the use of double-immune ferret sera may be a limitation of the study. It would be helpful if the authors can discuss the potential effect on the use of double-immune ferret sera in antigenicity characterization in the manuscript.

      Our study was designed to understand the breadth of the anti-NA response after the incorporation of NA as a vaccine antigens. Our data does not allow to conclude whether increased breadth of protection is merely due to increased antibody titers or whether an NA boost immunization was able to induce antibody responses against epitopes that were not previously recognized by primary response to infection. However, we now mention this possibility in the discussion and cite Kosikova et al. CID 2018, in this context.

      Another weakness is that the authors used the newly constructed a model to predict antigenic distance of three recent A(H3N2) viruses but there is no experimental data to validate their prediction (eg. if these viruses are indeed antigenically deviating from group 2 strains as concluded by the authors). In response to the comment, the authors have taken two strains out of the dataset and use them for validation. The results is shown as Fig. R7. However, it may be useful to include this in the main manuscript to support the validity of the model.

      The removal of 2 strains was performed to illustrate the predictive performance of the RF modeling. However, Random Forest does not require cross-validation. The reason is that RF modeling already uses an out-of-bag evaluation which, in short, consists of using only a fraction of the data for the creation of the decision trees (2/3 of the data), obviating the need for a set aside the test set:

      “…In each bootstrap training set, about one-third of the instances are left out. Therefore, the out-of-bag estimates are based on combining only about one- third as many classifiers as in the ongoing main combination. Since the error rate decreases as the number of combinations increases, the out-of-bag estimates will tend to overestimate the current error rate. To get unbiased out-of-bag estimates, it is necessary to run past the point where the test set error converges. But unlike cross-validation, where bias is present but its extent unknown, the out-of-bag estimates are unbiased…” from https://www.stat.berkeley.edu/%7Ebreiman/randomforest2001.pdf

      Reviewer #3 (Public Review):

      Summary:

      This paper by Portela Catani et al examines the antigenic relationships (measured using monotypic ferret and mouse sera) across a panel of N2 genes from the past 14 years, along with the underlying sequence differences and phylogenetic relationships. This is a highly significant topic given the recent increased appreciation of the importance of NA as a vaccine target, and the relative lack of information about NA antigenic evolution compared with what is known about HA. Thus, these data will be of interest to those studying the antigenic evolution of influenza viruses. The methods used are generally quite sound, though there are a few addressable concerns that limit the confidence with which conclusions can be drawn from the data/analyses.

      Strengths:

      • The significance of the work, and the (general) soundness of the methods. -Explicit comparison of results obtained with mouse and ferret sera

      Weaknesses:

      • Approach for assessing influence of individual polymorphisms on antigenicity does not account for potential effects of epistasis (this point is acknowledged by the authors).

      We agree with the reviewer and this point was addressed in the previous rebuttal.

      • Machine learning analyses neither experimentally validated nor shown to be better than simple, phylogenetic-based inference.

      We respectfully disagree with the reviewer. This point was addressed in the previous rebuttal as follows.

      This is a valid remark and indeed we have found a clear correlation between NAI cross reactivity and phylogenetic relatedness. However, besides achieving good prediction of the experimental data (as shown in Figure 5 and in FigureR7), machine Learning analysis has the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. ML can also support the selection and design of broader reactive antigens. “

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Discuss the discrepancy between Fig. 2 and the newly constructed N2 phylogenetic tree with 180 randomly selected N2 sequences of A(H3N2) viruses from 2009-2017. Specifically please explain the antigenic vs. phylogenetic relationship observed in Fig. 2 was not observed in the large N2 phylogenetic tree.

      Discrepancies were due to different method and visualization. A new tree was provided.

      (2) Include a sentence to discuss the potential effect on the use of double-immune ferret sera in antigenic characterization.

      We prefer not to speculate on this.

      (3) Include the results of the exercise run (with the use of Swe17 and HK17) in the manuscript as a way to validate the model.

      The exercise was performed to illustrate predictive potential of the RF modeling to the reviewer. However, cross-validation is not a usual requirement for random forest, since it uses out-of-bag calculations. We prefer to not include the exercise runs within the main manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript titled "Disease modeling and pharmacological rescue of autosomal dominant Retinitis Pigmentosa associated with RHO copy number variation" the authors describe the use of patient iPSC-derived retinal organoids to evaluate the pathobiology of a RHO-CNV in a family with dominant retinitis pigmentosa (RP). They find significantly increased expression of rhodopsin, especially within the photoreceptor cell body, and defects in photoreceptor cell outer segment formation/maturation. In addition, they demonstrate how an inhibitor of NR2E3 (a rod transcription factor required for inducing rhodopsin expression), can be used to rescue the disease phenotype.

      Strengths:

      The manuscript is very well written, the illustrations and data presented are compelling, and the authors' interpretation/discussion of their findings is logical.

      Weaknesses:

      A weakness, which the authors have addressed in the discussion section, is the lack of an isogenic control, which would allow for direct analysis of the RHO-CNV in the absence of the other genetic sequence contained within the duplicated region. As the authors suggest, CRISPR correction of a large CNV in the absence of inducing unwanted on-target editing events in patient iPSCs is often very challenging. Given that they have used a no-disease iPSC line obtained from a family member, controlled for organoid differentiation kinetics/maturation state, and that no other complete disease-causing gene is contained within the duplicated region, it is unlikely that the addition of an isogenic control would yield significantly different results.

      Aims and conclusions:

      This reviewer is of the opinion that the authors have achieved their aims and that their results support their conclusions.

      Discussion:

      The authors have provided adequate discussion on the utility of the methods and data as well as the impact of their work on the field.

      We thank the reviewer for their insightful, and encouraging review of our work that has taken several years to get to current stage.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kandoi et al. describes a new 3D retinal organoid model of a mono-allelic copy number variant of the rhodopsin gene that was previously shown to induce autosomal dominant retinitis pigmentosa via a dominant negative mechanism in patients. With advancements in the low-cost genomics application to detect copy number variations, this is a timely article that highlights a potential disease mechanism that goes beyond the retina field. The evidence is relatively strong that the rod photoreceptor phenotype observed in an adult patient with RP in vivo is similar to that phenotype observed in human stem cell-derived retinal organoids. Increases in RHO expression detected by qPCR, RNA-seq, and IHC support this phenotype. Importantly, the amelioration of photoreceptor rhodopsin mislocalization and related defects using the small molecule drug photoregulin demonstrates an important potential clinical application.

      Overall, the authors succeeded in providing solid evidence that copy number variation via a genomic RHO duplication leads to abnormalities in rod photoreceptors that can be partially blocked by photoregulin. However, there are several points that should be addressed that will enhance this paper.

      Strengths:

      • The use of patient-derived organoids from patients that have visual defects is a major strength of this work and adds relevance to the disease phenotype.

      • The rod phenotype assessed by qPCR, RNA-seq, and IHC supports a phenotype that shares similarities with the patient.

      • The use of a small molecule drug that selectively targets rod photoreceptors, as opposed to cones, is a noteworthy strength.

      We thank the reviewers for highlighting the key strengths of the paper.

      Weaknesses:

      (1) The chromosomal segment that was duplicated had 3 copies of RHO in addition to three copies of each of the flanking genes (IFT122, HIF100, PLXND1). Discussion of the involvement of these genes would be helpful. Would duplication of any of these genes alone cause or contribute to adRP? As an example, a missense mutation in IFT122 was previously implicated in photoreceptor loss (PMID: 33606121 PMCID: PMC8519925).

      Thank you for your comment. It is an interesting question on the contribution of the other duplicated genes. Of these, IFT122 is particularly interesting as pointed out. We did a thorough survey through literature and our genetic testing partner’s database, BluePrint Genetics. We did not find any human retinal degeneration cases with variants in IFT122. IFT122 has been shown to cause recessive phenotype in dogs and in complete knockout zebrafish model but dominant or overexpression has not been shown to have a phenotype. Interestingly, recessive biallelic IFT122 mutation can cause Cranioectodermal Dysplasia (Sensenbrenner syndrome, PMID: 24689072) and none of these patient exhibited retinal dystrophy. HIF100 is an epigenetic modifier gene while PLXND1 is expressed in endothelial cells. We will include a discussion on this in the revised manuscript.

      (2) Related to #1, have the authors considered inserting extra copies of RHO (and/or the flanking genes) of these at a genomic safe harbor site? Although not required, this would allow one to study cells with isogenic-matched genetic backgrounds and would partially address the technical challenge of repairing a 188kb duplication, which as the authors note would be difficult to do. Demonstrating that excess copy numbers in different genetic backgrounds would be a huge contribution to the field. At a minimum, a discussion of the role of the nearby genes should be included. 


      Thank you for your suggestion. We plan to test the relative role of 1-3 extra copies of RHO driven off a NRL promoter in order to drive it only in rods in our future mechanistic analysis studies. We will include a discussion on the potential role of the other genes in the revised manuscript.

      (3) In the patient, the central foveal region was spared suggesting that cones were normal. Was there a similar assessment that cones are unaffected in retinal organoids? 


      We will include this data in our revised manuscript but overall did not see a cone defect in RHO CNV organoids. Additionally, although it is true that the central foveal region was relatively spared in this patient, the cones are definitely not normal. The macular cones that remain have been damaged by chronic edema, and photoreceptor and RPE atrophy has progressed into the macula, sparing only the foveal cones.

      (4) Pathway analysis indicated that glycosylation was perturbed and this was proposed as an explanation as to why rhodopsin was mislocalized. Have the authors verified that there is an actual decrease in glycosylation? 


      These studies are ongoing. We are currently looking into the details of cellular pathophysiology focusing on RHO trafficking in RHO-CNV including role of glycosylation and other post-translational modifications defects.

      (5) Line 182: by what criteria are the authors able to state that " there were no clear visible anatomical changes in apical-basal retinal cell type distribution during the early differentiation timeframe (data not shown)." Was this based on histological staining with antibodies, nuclear counter-staining, or some other evaluation?


      This was based on both IHC for various cell type markers and nuclear (DAPI) staining.

      (6) Figure 2C - the appearance of the inner segments in RC and RM looks very different from one another. Have the authors ruled out the possibility that the RC organoid cell isn't a cone? In addition, the RM structure has what appears to be a well-defined OLM which would suggest well-formed Muller glia. Do these structures also exist in RC organoids? Typically the OLM does form in older organoids. In addition, was this representative in numerous EM preparations?


      For clarification on EM data, we will include additional images in the revision as supplementary data. We have not carefully compared OLM between the patient and control organoids but do observe them in both conditions in the older organoids. The EM preparations were made from multiple organoids from two different batches with consistent results.

      (7) What criteria were used to assess cell loss? Has any TUNEL labeling been performed to confirm cell loss? From the existing data, it seems that rod outer segments appear to be affected in organoids. However, it's not clear if the photoreceptors themselves actually die in this model.

      TUNEL was used to assess cell loss and it was not significantly different between the control and patient organoids at the timepoints examined. We did not expect a change as the disease in the patient developed over decades.

      (8) Figure 5B. The RHO staining in the vehicle-treated sample is perturbed relative to the PR3 treatments as indicated in the text. In the vehicle-treated sample, the number of DAPI-positive cells that are completely negative proximal to the inner segments suggests that there might be non-rod cells there. Have the authors confirmed whether these are cones? Labels would be helpful in the left vehicle panel as the morphology looks very different than the treated samples.


      Thank you very much for the various suggestions and these will be included in the revised manuscript version. A number of the cells in the negative regions are OTX2+/NRL- and likely to be cones (Figure 4 A and B). Unfortunately, we do not have a very good cone nuclear marker as RXRγ does not consistently stain mature cones.

      (9) It is interesting that in addition to increases in RHO, and photo-transduction, there are also increases in PTPRT which is related to synaptic adhesion. Is there evidence of ectopic neurites that result from PTPRT over-expression?

      You are absolutely correct that PTPRT data is very interesting. PTPRT requires similar PTMs like RHO in photoreceptors for its synaptic localization. We did not specifically look at ectopic neurites and test that in the revision. It will interesting to follow-up on its expression pattern to see if it gets processed or localized normally if we can find a working antibody. It is also possible that the gene-expression increase due to feedback upregulation secondary to improper protein processing.

      Reviewer #3 (Public Review):

      This manuscript reports a novel pedigree with four intact copies of RHO on a single chromosome which appears to lead to overexpression of rhodopsin and a corresponding autosomal dominant form of RP. The authors generate retinal organoids from patient- and control-derived cells, characterize the phenotypes of the organoids, and then attempt to 'treat' aberrant rhodopsin expression/mislocalization in the patient organoids using a small molecule called photoregulin 3 (PR3). While this novel genetic mechanism for adRP is interesting, the organoid work is not compelling. There are multiple problems related to the technical approaches, the presentation of the results, and the interpretations of the data. I will present my concerns roughly in the order in which they appear in the manuscript.

      Major concerns:

      (1) Individual human retinal organoids in culture can show a wide range of differentiation phenotypes with respect to the expression of specific markers, percentages of given cell types, etc. For this reason, it can be very difficult to make rigorous, quantitative comparisons between 'wild-type' and 'mutant' organoids. Despite this difficulty, the author of the present manuscript frequently presents results in an impressionistic manner without quantitation. Furthermore, there is no indication that the investigator who performed the phenotypic analyses was blind with respect to the genotype. In my opinion, such blinding is essential for the analysis of phenotypes in retinal organoids. To give an example, in lines 193-194 the authors write "we observed that while the patient organoids developing connecting cilium and the inner segments similar to control organoids, they failed to extend outer segments". Outer segments almost never form normally in human retinal organoids, even when derived from 'wild-type' cells. Thus, I consider it wholly inadequate to simply state that outer segment formation 'failed' without a rigorous, quantitative, and blinded comparison of patient and control organoids.

      We agree it is challenging to generate outer segments in retinal organoids but we are not the first to show this. This has been demonstrated by multiple independent labs (Mayerl et al (PMID: 36206764), Wahlin et al (PMID: 28396597), West at al (PMID: 35334217) including ours (Chirco et al (PMID: 34653402). To clarify, we did not observe any OS like tissue in the patient organoids across multiple EM preps of a number of organoids from two independent 300+ day experiments which matched the phase microscopy data presented in Fig2B.

      (2) The presentation of qPCR results in Figure 3A is very confusing. First, the authors normalize expression to that of CRX, but they don't really explain why. In lines 210-211, they write "CRX, a ubiquitously expressing photoreceptor gene maintained from development to adulthood." Several parts of this sentence are misleading or incomplete. First, CRX is not 'ubiquitously expressed' (which usually means 'in all cell types') nor is it photoreceptor-specific: CRX is expressed in rods, cones, and bipolar cells. Furthermore, CRX expression levels are not constant in photoreceptors throughout development/adulthood. So, for these reasons alone, CRX is a poor choice for the normalization of photoreceptor gene expression.

      As you are aware, all housekeeping genes have shortcomings when used for normalizing PCR data. We went with CRX as within the timepoints chosen, it is not expected to change much and thus represent a good equalizer for relative photoreceptor numbers between the organoids and conditions. While we agree that CRX is weakly expressed in bipolar cells (Yamamoto et al 2020), it is not expected to bias the data too much as we have not seen nor have other reported a huge relative difference in bipolar cell number in organoids. We also confirm this by showing equivalent expression of OTX2, RCVRN and NRL between all conditions.

      Second, the authors' interpretation of the qPCR results (lines 216-218) is very confusing. The authors appear to be saying that there is a statistically significant increase in RHO levels between D120 and D300. However, the same change is observed in both control and patient organoids and is not unexpected, since the organoids are more mature at D300. The key comparison is between control and patient organoids at D300. At this time point, there appears to be no difference between control and patient. The authors don't even point this out in the main text.

      Thank you for the comment and we apologize if this confused you. However, as can been seen in the graph in Figure 3A, we do compare expression of genes including RHO between control and patient organoids at two different time points. There are four conditions: D120-RC, D120-RM, D300-RC and D300-RM with individual data points and error bars for each condition. There is a statistically significant increase at both time points upon comparing the control and patient organoids for RHO. We compared RHO expression between patient organoids at the two time points and it was not statistically different.

      Third, the variability in the number of photoreceptor cells in individual organoids makes a whole-organoid comparison by qPCR fraught with difficulty. It seems to me that what is needed here is a comparison of RHO transcript levels in isolated rod photoreceptors.

      We agree that this makes it challenging. This was the exact reasoning for using CRX for normalization since it is predominantly present in photoreceptors. This was validated by the data showing no difference in expression of photoreceptor markers OTX2, RCVRN or NRL between the organoids.

      (3) I cannot understand what the authors are comparing in the bulk RNA-seq analysis presented in the paragraph starting with line 222 and in the paragraph starting with line 306. They write "we performed bulk-RNA sequencing on 300-days-old retinal organoids (n=3 independent biological replicates). Patient retinal organoids demonstrated upregulated transcriptomic levels of RHO... comparable to the qRT-PCR data." From the wording, it suggests that they are comparing bulk RNA-seq of patients and control organoids at D300. However, this is not stated anywhere in the main text, the figure legend, or the Methods. Yet, the subsequent line "comparable to the qRT-PCR data" makes no sense, because the qPCR comparison was between patient samples at two different time points, D120 and D300, not between patient and control. Thus, the reader is left with no clear idea of what is even being compared by RNA-seq analysis.

      We apologize if the conditions were not obvious and will clarify this in the revised version. The conditions compared are control and patient organoids at D300. Regarding comparison to RT-PCR, as stated above, the comparison shown is between patient and control organoids at two different timepoints.

      Remarkably, the exact same lack of clarity as to what is being compared is found in the second RNA-seq analysis presented in the paragraph starting with line 306. Here the authors write "We further carried out bulk RNA-sequencing analysis to comprehensively characterize three different groups of organoids, 0.25 μM PR3-treated and vehicle-treated patient organoids and control (RC) organoids from three independent differentiation experiments. Consistent with the qRT-PCR gene expression analysis, the results showed a significant downregulation in RHO and other rod phototransduction genes." Here, the authors make it clear that they have performed RNA-seq on three types of samples: PR3-treated patient organoids, vehicle-treated patient organoids, and control organoids (presumably not treated). Yet, in the next sentence, they state "the results showed a significant downregulation in RHO", but they don't state what two of the three conditions are being compared! Although I can assume that the comparison presented in Fig. 6A is between patient vehicle-treated and PR3-treated organoids, this is nowhere explicitly stated in the manuscript.

      Thank you for the comment and we will explicitly state various comparisons in the revised version.

      (4) There are multiple flaws in the analysis and interpretation of the PR3 treatment results. The authors wrote (lines 289-2945) "We treated long-term cultured 300-days-old, RHO-CNV patient retinal organoids with varying concentrations of PR3 (0.1, 0.25 and 0.5 μM) for one week and assessed the effects on RHO mRNA expression and protein localization. Immunofluorescence staining of PR3-treated organoids displayed a partial rescue of RHO localization with optimal trafficking observed in the 0.25 μM PR3-treated organoids (Figure 5B). None of the organoids showed any evidence of toxicity post-treatment."

      There are multiple problems here. First, the results are impressionistic and not quantitative. Second, it's not clear that the investigator was blinded with respect to the treatment condition. Third, in the sections presented, the organoids look much more disorganized in the PR3-treated conditions than in the control. In particular, the ONL looks much more poorly formed. Overall, I'd say the organoids looked considerably worse in the 0.25 and 0.5 microM conditions than in the control, but I don't know whether or not the images are representative. Without rigorously quantitative and blinded analysis, it is impossible to draw solid conclusions here. Lastly, the authors state that "none of the organoids showed any evidence of toxicity post-treatment," but do not explain what criteria were used to determine that there was no toxicity.

      Thank you for your critical insight. The RHO localization data is qualitative as it is very difficult to accurately quantify rhodopsin trafficking within the cell in the organoid. Thus, for quantitative comparison, we have provided expression level changes. Regarding toxicity, we analyzed the organoids by morphology and TUNEL and did not observe significant difference between the conditions. This closely mimics mouse data on PR3 which suppressed rod function in mice following IP injection without any obvious toxicity.

      (5) qPCR-based quantitation of rod gene expression changes in response to PR3 treatment is not well-designed. In lines 294-297 the authors wrote "PR3 drove a significant downregulation of RHO in a dose-dependent manner. Following qRT-PCR analysis, we observed a 2-to-5 log2FC decrease in RHO expression, along with smaller decreases in other rod-specific genes including NR2E3, GNAT1 and PDE6B." I assume these analyses were performed on cDNA derived from whole organoids. There are two problems with this analysis/interpretation. First, a decrease in rod gene expression can be caused by a decrease in the number of rods in the treated organoids (e.g., by cell death) or by a decrease in the expression of rod genes within individual rods. The authors do not distinguish between these two possibilities. Second, as stated above, the percentage of cells that are rods in a given organoid can vary from organoid to organoid. So, to determine whether there is downregulation of rod gene expression, one should ideally perform the qPCR analysis on purified rods.

      The reviewer is correct in pointing the potential reasons for reduction in RHO levels following PR3 treatment. Thus, we have provided NRL expression levels in the graph to show that this key rod-specific gene does not change suggesting equivalent number of rod photoreceptor cells. The suggestion of using purified rods is not practical here, as we do not have any way to sort human rods due to the lack of a rod-specific cell surface marker.

      (6) In Figure 4B 'RM' panels, the authors show RHO staining around the somata of 'rods' but the inset images suggest that several of these cells lack both NRL and OTX2 staining in their nuclei. All rods should be positive for NRL. Conversely, the same image shows a layer of cells scleral to the cells with putative RHO somal staining which do not show somal staining, and yet they do appear to be positive for NRL and OTX2. What is going on here? The authors need to provide interpretations for these findings.

      Since RHO is a cytoplasmic marker and photoreceptor are tightly packed, it is difficult to make a 1:1 comparison to NRL/OTX2 nuclear marker to RHO. Additionally, as the RHO+ cytoplasm moves towards scleral surface, it is expected to pass adjacent to other nuclei. Few of the rods do still have normal Rhodopsin trafficking and it is likely these will not have somal RHO similar to control conditions. We do rarely observe these cells as highlighted by the occasional RHO in IS/OS of RM organoids in the figure. We do agree that the NRL staining in the figure 4B (>D250) is not extremely crisp and we will include an updated figure in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a new and valuable theoretical account of spatial representational drift in the hippocampus. The evidence supporting the claims is convincing, with a clear and accessible explanation of the phenomenon. Overall, this study will likely attract researchers exploring learning and representation in both biological and artificial neural networks.

      We would like to ask the reviewers to consider elevating the assessment due to the following arguments. As noted in the original review, the study bridges two different fields (machine learning and neuroscience), and does not only touch a single subfield (representational drift in neuroscience). In the revision, we also analysed data from four different labs, strengthening the evidence and the generality of the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors start from the premise that neural circuits exhibit "representational drift" -- i.e., slow and spontaneous changes in neural tuning despite constant network performance. While the extent to which biological systems exhibit drift is an active area of study and debate (as the authors acknowledge), there is enough interest in this topic to justify the development of theoretical models of drift.

      The contribution of this paper is to claim that drift can reflect a mixture of "directed random motion" as well as "steady state null drift." Thus far, most work within the computational neuroscience literature has focused on the latter. That is, drift is often viewed to be a harmless byproduct of continual learning under noise. In this view, drift does not affect the performance of the circuit nor does it change the nature of the network's solution or representation of the environment. The authors aim to challenge the latter viewpoint by showing that the statistics of neural representations can change (e.g. increase in sparsity) during early stages of drift. Further, they interpret this directed form of drift as "implicit regularization" on the network.

      The evidence presented in favor of these claims is concise. Nevertheless, on balance, I find their evidence persuasive on a theoretical level -- i.e., I am convinced that implicit regularization of noisy learning rules is a feature of most artificial network models. This paper does not seem to make strong claims about real biological systems. The authors do cite circumstantial experimental evidence in line with the expectations of their model (Khatib et al. 2022), but those experimental data are not carefully and quantitatively related to the authors' model.

      We thank the reviewer for pushing us to present stronger experimental evidence. We now analysed data from four different labs. Two of those are novel analyses of existing data (Karlsson et al, Jercog et al). All datasets show the same trend - increasing sparsity and increasing information per cell. We think that the results, presented in the new figure 3, allow us to make a stronger claim on real biological systems.

      To establish the possibility of implicit regularization in artificial networks, the authors cite convincing work from the machine-learning community (Blanc et al. 2020, Li et al., 2021). Here the authors make an important contribution by translating these findings into more biologically plausible models and showing that their core assumptions remain plausible. The authors also develop helpful intuition in Figure 4 by showing a minimal model that captures the essence of their result.

      We are glad that these translation efforts are appreciated.

      In Figure 2, the authors show a convincing example of the gradual sparsification of tuning curves during the early stages of drift in a model of 1D navigation. However, the evidence presented in Figure 3 could be improved. In particular, 3A shows a histogram displaying the fraction of active units over 1117 simulations. Although there is a spike near zero, a sizeable portion of simulations have greater than 60% active units at the end of the training, and critically the authors do not characterize the time course of the active fraction for every network, so it is difficult to evaluate their claim that "all [networks] demonstrated... [a] phase of directed random motion with the low-loss space." It would be useful to revise the manuscript to unpack these results more carefully. For example, a histogram of log(tau) computed in panel B on a subset of simulations may be more informative than the current histogram in panel A.

      The previous figure 3A was indeed confusing. In particular, it lumped together many simulations without proper curation. We redid this figure (now Figure 4), and added supplementary figures (Figures S1, S2) to better explain our results. It is now clear that the simulations with a large number of active units were either due to non-convergence, slow timescale of sparsification or simulations featuring label noise in which the fraction of active units is less affected. Regarding the log(tau) calculation, while it could indeed be an informative plot, it could not be calculated in a simple manner for all simulations. This is because learning curves are not always exponential, but sometimes feature initial plateaus (see also Saxe et al 2013, Schuessler et al 2020). We added a more detailed explanation of this limitation in the methods section, and we believe the current figure exemplifies the effect in a satisfactory manner.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Representational drift as a result of implicit regularization" the authors study the phenomenon of representational drift (RD) in the context of an artificial network that is trained in a predictive coding framework. When trained on a task for spatial navigation on a linear track, they found that a stochastic gradient descent algorithm led to a fast initial convergence to spatially tuned units, but then to a second very slow, yet directed drift which sparsified the representation while increasing the spatial information. They finally show that this separation of timescales is a robust phenomenon and occurs for a number of distinct learning rules.

      Strengths:

      This is a very clearly written and insightful paper, and I think people in the community will benefit from understanding how RD can emerge in such artificial networks. The mechanism underlying RD in these models is clearly laid out and the explanation given is convincing.

      We thank the reviewer for the support.

      Weaknesses:

      It is unclear how this mechanism may account for the learning of multiple environments.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      The process of RD through this mechanism also appears highly non-stationary, in contrast to what is seen in familiar environments in the hippocampus, for example.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid phase, consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Public Review):

      Summary:

      Single-unit neural activity tuned to environmental or behavioral variables gradually changes over time. This phenomenon, called representational drift, occurs even when all external variables remain constant, and challenges the idea that stable neural activity supports the performance of well-learned behaviors. While a number of studies have described representational drift across multiple brain regions, our understanding of the underlying mechanism driving drift is limited. Ratzon et al. propose that implicit regularization - which occurs when machine learning networks continue to reconfigure after reaching an optimal solution - could provide insights into why and how drift occurs in neurons. To test this theory, Ratzon et al. trained a Feedforward Network trained to perform the oft-utilized linear track behavioral paradigm and compare the changes in hidden layer units to those observed in hippocampal place cells recorded in awake, behaving animals.

      Ratzon et al. clearly demonstrate that hidden layer units in their model undergo consistent changes even after the task is well-learned, mirroring representational drift observed in real hippocampal neurons. They show that the drift occurs across three separate measures: the active proportion of units (referred to as sparsification), spatial information of units, and correlation of spatial activity. They continue to address the conditions and parameters under which drift occurs in their model to assess the generalizability of their findings.

      However, the generalizability results are presented primarily in written form: additional figures are warranted to aid in reproducibility.

      We added figures, and a Github with all the code to allow full reproducibility.

      Last, they investigate the mechanism through which sparsification occurs, showing that the flatness of the manifold near the solution can influence how the network reconfigures. The authors suggest that their findings indicate a three-stage learning process: 1) fast initial learning followed by 2) directed motion along a manifold which transitions to 3) undirected motion along a manifold.

      Overall, the authors' results support the main conclusion that implicit regularization in machine learning networks mirrors representational drift observed in hippocampal place cells.

      We thank the reviewer for this summary.

      However, additional figures/analyses are needed to clearly demonstrate how different parameters used in their model qualitatively and quantitatively influence drift.

      We now provide additional figures regarding parameters (Figures S1, S2).

      Finally, the authors need to clearly identify how their data supports the three-stage learning model they suggest.

      Their findings promise to open new fields of inquiry into the connection between machine learning and representational drift and generate testable predictions for neural data.

      Strengths:

      (1) Ratzon et al. make an insightful connection between well-known phenomena in two separate fields: implicit regularization in machine learning and representational drift in the brain. They demonstrate that changes in a recurrent neural network mirror those observed in the brain, which opens a number of interesting questions for future investigation.

      (2) The authors do an admirable job of writing to a large audience and make efforts to provide examples to make machine learning ideas accessible to a neuroscience audience and vice versa. This is no small feat and aids in broadening the impact of their work.

      (3) This paper promises to generate testable hypotheses to examine in real neural data, e.g., that drift rate should plateau over long timescales (now testable with the ability to track single-unit neural activity across long time scales with calcium imaging and flexible silicon probes). Additionally, it provides another set of tools for the neuroscience community at large to use when analyzing the increasingly high-dimensional data sets collected today.

      We thank the reviewer for these comments. Regarding the hypotheses, these are partially confirmed in the new analyses we provide of data from multiple labs (new Figure 3 and Table 3) - indicating that prolonged exposure to the environment leads to more stationarity.

      Weaknesses:

      (1) Neural representational drift and directed/undirected random walks along a manifold in ML are well described. However, outside of the first section of the main text, the analysis focuses primarily on the connection between manifold exploration and sparsification without addressing the other two drift metrics: spatial information and place field correlations. It is therefore unclear if the results from Figures 3 and 4 are specific to sparseness or extend to the other two metrics. For example, are these other metrics of drift also insensitive to most of the Feedforward Network parameters as shown in Figure 3 and the related text? These concerns could be addressed with panels analogous to Figures 3a-c and 4b for the other metrics and will increase the reproducibility of this work.

      We note that the results from figures 3 and 4 (original manuscript) are based on abstract tasks, while in figure 2 there is a contextual notion of spatial position. Spatial position metrics are not applicable to the abstract tasks as they are simple random mapping of inputs, and there isn’t necessarily an underlying latent variable such as position. This transition between task types is better explained in the text now. In essence the spatial information and place field correlation changes are simply signatures of the movements in parameter space. In the abstract tasks their change becomes trivial, as the spatial information becomes strongly correlated with sparsity and place fields are simply the activity vectors of units. These are guaranteed to change as long as there are changes in the activity statistics. We present here the calculation of these metrics averaged over simulations for completeness.

      Author response image 1.

      PV correlation between training time points averaged over 362 simulations. (B) Mean SI of units normalized to first time step, averaged over 362 simulations. Red line shows the average time point of loss convergence, the shaded area represents one standard deviation.

      (2) Many caveats/exceptions to the generality of findings are mentioned only in the main text without any supporting figures, e.g., "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify" (lines 116-117). Supporting figures are warranted to illustrate which findings are "qualitatively different" from the main model, which are not different from the main model, and which of the many parameters mentioned are important for reproducing the findings.

      We now added figures (S1, S2) that show this exactly. We also added a github to allow full reproduction.

      (3) Key details of the model used by the authors are not listed in the methods. While they are mentioned in reference 30 (Recanatesi et al., 2021), they need to be explicitly defined in the methods section to ensure future reproducibility.

      The details of the simulation are detailed in the methods sections. We also added a github to allow full reproducibility.

      (4) How different states of drift correspond to the three learning stages outlined by the authors is unclear. Specifically, it is not clear where the second stage ends, and the third stage begins, either in real neural data or in the figures. This is compounded by the fact that the third stage - of undirected, random manifold exploration - is only discussed in relation to the introductory Figure 1 and is never connected to the neural network data or actual brain data presented by the authors. Are both stages meant to represent drift? Or is only the second stage meant to mirror drift, while undirected random motion along a manifold is a prediction that could be tested in real neural data? Identifying where each stage occurs in Figures 2C and E, for example, would clearly illustrate which attributes of drift in hidden layer neurons and real hippocampal neurons correspond to each stage.

      Thanks for this comment, which urged us to better explain these concepts.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Recommendations for the authors:

      The reviewers have raised several concerns. They concur that the authors should address the specific points below to enhance the manuscript.

      (1) The three different phases of learning should be clearly delineated, along with how they are determined. It remains unclear in which exact phase the drift is observed.

      This is now clearly explained in the new Table 1 and Figure 4C. Note that the different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      (2) The term "sparsification" of unit activity is not fully clear. Its meaning should be more explicitly explained, especially since, in the simulations, a significant number of units appear to remain active (Fig. 3A).

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      (3) While the study primarily focuses on one aspect of representational drift-the proportion of active units-it should also explore other features traditionally associated with representational drift, such as spatial information and the correlation between place fields.

      This absence of features is related to the abstract nature of some of the tasks simulated in our paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      Both the initial simulation and the new experimental data analysis include spatial information (Figures 2,3). The following simulations (Figure 4) with many parameter choices use more abstract tasks, for which the notion of correlation between place cells and spatial information loses its meaning as there is no spatial ordering of the inputs, and every input is encountered only once. Spatial information becomes strongly correlated with the inverse of the active fraction metric. The correlation between place cells is also directly linked to increase in sparseness for these tasks.

      (4) There should be a clearer illustration of how labeling noise influences learning dynamics and sparsification.

      This was indeed confusing in the original submission. We removed the simulations with label noise from Figure 4, and added a supplementary figure (S2) illustrating the different effects of label noise.

      (5) The representational drift observed in this study's simulations appears to be nonstationary, which differs from in vivo reports. The reasons for this discrepancy should be clarified.

      We added experimental results from three additional labs demonstrating a change in activity statistics (i.e. increase in spatial information and increase in sparseness) over a long period of time. We suggest that such a change long after the environment is already familiar is an indication for the second phase, and stress that this change seems to saturate at some point, and that most drift papers start collecting data after this saturation, hence this effect was missed in previous in vivo reports. Furthermore, these effects are become more abundant with the advent on new calcium imaging methods, as the older electrophysiological regording methods did not usually allow recording of large amounts of cells for long periods of time. The new Table 3 surveys several experimental papers, emphasizing the degree of familiarity with the environment.

      (6) A distinctive feature of the hippocampus is its ability to learn different spatial representations for various environments. The study does not test representational drift in this context, a topic of significant interest to the community. Whether the authors choose to delve into this is up to them, but it should at least be discussed more comprehensively, as it's only briefly touched upon in the current manuscript version.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (7) The methods section should offer more details about the neural nets employed in the study. The manuscript should be explicit about the terms "hidden layer", "units", and "neurons", ensuring they are defined clearly and not used interchangeably..

      We changed the usage of these terms to be more coherent and made our code publicly available. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      In addition, each reviewer has raised both major and minor concerns. These are listed below and should be addressed where possible.

      Reviewer #1 (Recommendations For The Authors):

      I recommend that the authors edit the text to soften their claims. For example:

      In the abstract "To uncover the underlying mechanism, we..." could be changed to "To investigate, we..."

      Agree. Done

      On line 21, "Specifically, recent studies showed that..." could be changed to "Specifically, recent studies suggest that..."

      Agree. Done

      On line 100, "All cases" should probably be softened to "Most cases" or more details should be added to Figure 3 to support the claim that every simulation truly had a phase of directed random motion.

      The text was changed in accordance with the reviewer’s suggestion. In addition, the figure was changed and only includes simulations in which we expected unit sparsity to arise (without label noise). We also added explanations and supplementary figures for label noise.

      Unless I missed something obvious, there is no new experimental data analysis reported in the paper. Thus, line 159 of the discussion, "a phenomenon we also observed in experimental data" should be changed to "a phenomenon that recently reported in experimental data."

      We thank the reviewer for drawing our attention to this. We now analyzed data from three other labs, two of which are novel analyses on existing data. All four datasets show the same trends of sparseness with increasing spatial information. The new Figure 3 and text now describe this.

      On line 179 of the Discussion, "a family of network configurations that have identical performance..." could be softened to "nearly identical performance." It would be possible for networks to have minuscule differences in performance that are not detected due to stochastic batch effects or limits on machine precision.

      The text was changed in accordance with the reviewer’s suggestion.

      Other minor comments:

      Citation 44 is missing the conference venue, please check all citations are formatted properly.

      Corrected.

      In the discussion on line 184, the connection to remapping was confusing to me, particularly because the cited reference (Sanders et al. 2020) is more of a conceptual model than an artificial network model that could be adapted to the setting of noisy learning considered in this paper. How would an RNN model of remapping (e.g. Low et al. 2023; Remapping in a recurrent neural network model of navigation and context inference) be expected to behave during the sparsifying portion of drift?

      We now clarified this section. The conceptual model of Sanders et al includes a specific prediction (Figure 7 there) which is very similar to ours - a systematic change in robustness depending on duration of training. Regarding the Low et al model, using such mechanistic models is an exciting avenue for future research.

      Reviewer #2 (Recommendations For The Authors):

      I only have two major questions.

      (1) Learning multiple representations: Memory systems in the brain typically must store many distinct memories. Certainly, the hippocampus, where RD is prominent, is involved in the ongoing storage of episodic memories. But even in the idealized case of just two spatial memories, for example, two distinct linear tracks, how would this learning process look? Would there be any interference between the two learning processes or would they be largely independent? Is the separation of time scales robust to the number of representations stored? I understand that to answer this question fully probably requires a research effort that goes well beyond the current study, but perhaps an example could be shown with two environments. At the very least the authors could express their thoughts on the matter.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (2) Directed drift versus stationarity: I could not help but notice that the RD illustrated in Fig.2D is not stationary in nature, i.e. the upper right and lower left panels are quite different. This appears to contrast with findings in the hippocampus, for example, Fig.3e-g in (Ziv et al, 2013). Perhaps it is obvious that a directed process will not be stationary, but the authors note that there is a third phase of steady-state null drift. Is the RD seen there stationary? Basically, I wonder if the process the authors are studying is relevant only as a novel environment becomes familiar, or if it is also applicable to RD in an already familiar environment. Please discuss the issue of stationarity in this context.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid, phase consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Recommendations For The Authors):

      Most of my general recommendations are outlined in the public review. A large portion of my comments regards increasing clarity and explicitly defining many of the terms used which may require generating more figures (to better illustrate the generality of findings) or modifying existing figures (e.g., to show how/where the three stages of learning map onto the authors' data).

      Sparsification is not clearly defined in the main text. As I read it, sparsification is meant to refer to the activity of neurons, but this needs to be clearly defined. For example, lines 262-263 in the methods define "sparseness" by the number of active units, but lines 116-117 state: "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify." If the fraction of active units (defined as "sparseness") did not change, what does it mean that the activity of the units "sparsified"? If the authors mean that the spatial activity patterns of hidden units became more sharply tuned, this should be clearly stated.

      We now defined precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      Likewise, it is unclear which of the features the authors outlined - spatial information, active proportion of units, and spatial correlation - are meant to represent drift. The authors should clearly delineate which of these three metrics they mean to delineate drift in the main text rather than leave it to the reader to infer. While all three are mentioned early on in the text (Figure 2), the authors focus more on sparseness in the last half of the text, making it unclear if it is just sparseness that the authors mean to represent drift or the other metrics as well.

      The main focus of our paper is on the non-stationarity of drift. Namely that features (such as these three) systematically change in a directed manner as part of the drift process. This is in The new analyses of experimental data show sparseness and spatial information.

      The focus on sparseness in the second half of the paper is because we move to more abstract These are also easy to study in the more abstract tasks in the second part of the paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      It is not clear if a change in the number of active units alone constitutes "drift", especially since Geva et al. (2023) recently showed that both changes in firing rate AND place field location drive drift, and that the passage of time drives changes in activity rate (or # cells active).

      Our work did not deal with purely time-dependent drift, but rather focused on experience-dependence. Furthermore, Geva et al study the stationary phase of drift, where we do not expect a systematic change in the total number of cells active. They report changes in the average firing rate of active cells in this phase, as a function of time - which does not contradict our findings.

      "hidden layer", "units", and "neurons" seem to be used interchangeably in the text (e.g., line 81-85). However, this is confusing in several places, in particular in lines 83-85 where "neurons" is used twice. The first usage appears to refer to the rate maps of the hidden layer units simulated by the authors, while the second "neurons" appears to refer to real data from Ziv 2013 (ref 5). The authors should make it explicit whether they are referring to hidden layer units or actual neurons to avoid reader confusion.

      We changed the usage of these terms to be more coherent. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      The authors should clearly illustrate which parts of their findings support their three-phase learning theory. For example, does 2E illustrate these phases, with the first tenth of training time points illustrating the early phase, time 0.1-0.4 illustrating the intermediate phase, and 0.4-1 illustrating the last phase? Additionally, they should clarify whether the second and third stages are meant to represent drift, or is it only the second stage of directed manifold exploration that is considered to represent drift? This is unclear from the main text.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Line 45 - It appears that the acronym ML is not defined above here anywhere.

      Added.

      Line 71: the ReLU function should be defined in the text, e.g., sigma(x) = x if x > 0 else 0.

      Added.

      106-107: Figures (or supplemental figures) to demonstrate how most parameters do not influence sparsification dynamics are warranted. As written, it is unclear what "most parameters" mean - all but noise scale. What about the learning rule? Are there any interactions between parameters?

      We now removed the label noise from Figure 4, and added two supplementary figures to clearly explain the effect of parameters. Figure 4 itself was also redone to clarify this issue.

      2F middle: should "change" be omitted for SI?

      The panel was replaced by a new one in Figure 3.

      116-119: A figure showing how results differ for label noise is warranted.

      This is now done in Figure S1, S2.

      124: typo, The -> the

      Corrected.

      127-129: This conclusion statement is the first place in the text where the three stages are explicitly outlined. There does not appear to be any support or further explanation of these stages in the text above.

      We now explain this earlier at the end of the Introduction section, along with the new Table 1 and marking on Figure 4C.

      132-133 seems to be more of a statement and less of a prediction or conclusion - do the authors mean "the flatness of the loss landscape in the vicinity of the solution predicts the rate of sparsification?"

      We thank the reviewer for this observation. The sentence was rephrased:

      Old: As illustrated in Fig. 1, different solutions in the zero-loss manifold might vary in some of their properties. The specific property suggested from theory is the flatness of the loss landscape in the vicinity of the solution.

      New: As illustrated in Fig. 1, solutions in the zero-loss manifold have identical loss, but might vary in some of their properties. The authors of [26] suggest that noisy learning will slowly increase the flatness of the loss landscape in the vicinity of the solution.

      135: typo, it's -> its

      Corrected.

      Line 135-136 "Crucially, the loss on the 136 entire manifold is exactly zero..." This appears to contradict the Figure 4A legend - the loss appears to be very high near the top and bottom edges of the manifold in 4A. Do the authors mean that the loss along the horizontal axis of the manifold is zero?

      The reviewer is correct. The manifold mentioned in the sentence is indeed the horizontal axis. We changed the text and the figure to make it clearer.

      Equation 6: This does not appear to agree with equation 2 - should there be an E_t term for an expectation function?

      Corrected.

      Line 262-263: "Sparseness means that a unit has become inactive for all inputs." This should also be stated explicitly as the definition of sparseness/sparsification in the main text.

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General comments

      All three experts have raised excellent ideas and made important suggestions to extend the scope of our study and provide additional information. While we fully acknowledge that these points are valid and would provide exciting new knowledge, we also should not lose track of the fact that a single study cannot cover all bases. Sulfated steroids, for example, are clearly essential components of mouse urine. Unfortunately, however, all chemical analysis approaches are limited and the one we opted for is not suitable for analysis of such signaling molecules. Future studies should certainly focus on these aspects. The same holds true for the fact that we do not know which of the identified compounds are actually VSN ligands. These are inherent limitations of the approach, and we are not claiming otherwise.

      Reviewer #1 (Public Review):

      (1) In this manuscript, Nagel et al. sought to comprehensively characterize the composition of urinary compounds, some of which are putative chemosignals. They used urines from adult males and females in three different strains, including one wild-derived strain. By performing mass spectrometry of two classes of compounds: volatile organic compounds and proteins, they found that urines from inbred strains are qualitatively similar to those of a wild strain. This finding is significant because there is a high degree of genetic diversity in wild mice, with chemosensory receptor genes harboring many polymorphisms.

      We agree and thank the Reviewer for his / her positive assessment.

      (2) In the second part of this work, the authors used calcium imaging to monitor the pattern of vomeronasal neuron responses to these urines. By performing pairwise comparisons, the authors found a large degree of strain-specific response and a relatively minor response to sex-specific urinary stimuli. This is a finding generally in agreement with previous calcium imaging work by Ron Yu and colleagues in 2008. The authors extend the previous work by using urines from wild mice. They further report that the concentration diversity of urinary compounds in different urine batches is largely uncorrelated with the activity profiles of these urines. In addition, the authors found that the patterns of vomeronasal neuron response to urinary cues are not identical when measured using different recipient strains. This fascinating finding, however, requires an additional control to exclude the possibility that this is not due to sampling error.

      We thank Reviewer 1 for pointing this out. We agree that this is truly a “fascinating finding.” Reviewer 1 emphasizes that we need to add an “additional control to exclude […] that this is not due to sampling error”, and he / she elaborates on the required control in his / her Recommendations For The Authors (see below). Reviewer 1 states that “for Fig. 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.” Importantly, we believe that this is already controlled for. In fact, for each experiment, we routinely prepare VNO slices along the organ’s entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, in which we analyzed whether the “same urine activates a different population of VSNs in two different strains”, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we conclude that it is very unlikely that the considerably different response profiles measured in different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also refer to this point in the Results.

      (3) There are several weaknesses in this manuscript, including the lack of analysis of the compositions of sulfated steroids and other steroids, which have been proposed to be the major constituents of vomeronasal ligands in urines and the indirect (correlational) nature of their mass spectrometry data and activity data.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a feature that we consider a strength of the current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (4) Overall, the major contribution of this work is the identification of specific molecules in mouse urines. This work is likely to be of significant interest to researchers in chemosensory signaling in mammals and provides a systematic avenue to exhaustively identify vomeronasal ligands in the future.

      We thank the Reviewer for his / her generally positive assessment.

      Reviewer #2 (Public Review):

      (1) This manuscript by Nagel et al provides a comprehensive examination of the chemical composition of mouse urine (an important source of semiochemicals) across strain and sex, and correlates these differences with functional responses of vomeronasal sensory neurons (an important sensory population for detecting chemical social cues). The strength of the work lies in the careful and comprehensive imaging and chemical analyses, the rigor of quantification of functional responses, and the insight into the relevance of olfactory work on lab-derived vs wild-derived mice.

      We thank the Reviewer for his / her generally positive assessment.

      (2) With regards to the chemical analysis, the reader should keep in mind that a difference in the concentration of a chemical across strain or sex does not necessarily mean that that chemical is used for chemical communication. In the most extreme case, the animals may be completely insensitive to the chemical. Thus, the fact that the repertoire of proteins and volatiles could potentially allow sex and/or strain discrimination, it is unclear to what degree both are used in different situations.

      Reviewer 2 is correct to point out that sex- and/or strain-dependent differences in urine molecular composition do not automatically attribute a signaling function to those molecules. We concur and, in fact, stress this point many times throughout the manuscript. In the Results, for example, we point out (i) that “in female urine, BALB/c-specific proteins are substantially underrepresented, a fact not reflected by VSN response profiles”, (ii) that “as observed in C57BL/6 neurons, the skewed distributions of protein concentration indices were not reflected by BALB/c generalist VSN profiles”, and (iii) that “VSN population response profiles do not reflect the global molecular content of urine, suggesting that the VNO functions as a rather selective molecular detector.” Moreover, in the Discussion, we state (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”

      In the revised manuscript, we now aim to even more strongly emphasize the point made by Reviewer 2. In the Discussion, we have deleted a sentence that read: “Sex- and strain-specific chemical profiles give rise to unique VSN activity patterns.” Moreover, we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      Reviewer #3 (Public Review):

      (1) One of the primary objectives in this study is to ascertain the extent to which the response profiles of VSNs are specific to sex and strain. The design of these Ca2+ imaging experiments uses a simple stimulus design, using two interleaved bouts of stimulation with pairs of urine (e.g. male versus female C57BL/6, male C57BL/6 versus male BALB/c) at a single dilution factor (1:100). This introduces two significant limitations: (1) the "generalist" versus "specialist" descriptors pertain only to the specific pairwise comparisons made and (2) there is no information about the sensitivity/concentration-dependence of the responses.

      Reviewer 3 points to two limitations of our VSN activity assay. He / she is correct to mention that characterizing a VSN as generalist or specialist based on a “pairwise comparison” should not be the basis of attributing such a “generalist” or “specialist” label in general (i.e., regarding the global stimulus space). We acknowledge this point, but we do not regard this as a limitation of our study since we are not investigating rather broad (i.e., multidimensional) questions of selectivity. All we are asking in the context of this study is whether VSNs - when being challenged with pairs of sex- or strain-specific urine samples - act as rather selective semiochemical detectors. Of course, one can always think of a study design that provides more information. However, we here opted for an assay that - in our hands - is robust, “low noise” (i.e., displays low intrinsic signal variability as evident form reliability index calculations), ensures recovery from VSN adaptation (Wong et al., 2018), and, importantly, answers the specific question we are asking.

      Regarding the second point (“there is no information about the sensitivity/concentrationdependence of the responses”), we would like to emphasize that this was not a focus of our study either. In fact, concentration-dependence of VSN activity has been a major focus of several previous studies referenced in our manuscript (e.g., Leinders-Zufall et al., 2000; He et al., 2008), albeit with contradictory results. In our study, we ask whether a pair of stimuli that we have shown to display, in part, strikingly different chemical composition (both absolute and relative) preferentially activates the same or different VSNs. With this question in mind, we believe that our assay (and its results) are highly informative.

      (2) The functional measurements of VSN tuning to various pairs of urine stimuli are consistently presented alongside mass spectrometry-based comparisons. Although it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis, the juxtaposition of VSN tuning measurements with independent molecular diversity measurements gives the appearance to readers that these experiments were integrated (i.e., that the diversity of ligands was underlying the diversity of physiological responses). This is a hypothesis raised by the parallel studies, not a supported conclusion of the work. This data presentation style risks confusing readers.

      As Reviewer 3 points out correctly “it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis.” In the figures, we try make the distinction between VSN response statistics and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts.

      We now also made an extra effort to avoid “confusing readers” by stating in the Discussion (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.” Moreover, we have deleted a sentence that read: “sex- and strain-specific chemical profiles give rise to unique VSN activity patterns”, and we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      However, we believe that there is value in presenting “VSN tuning measurements” next to “independent molecular diversity measurements.” While these are independent measurements, their similarity or, quite frequently, lack thereof are informative. We are sure that by taking the above “precautions” we have now mitigated the risk of “confusing readers.”

      (3) The impact of mass spectrometry findings is limited by the fact that none of these molecules (in bulk, fractions, or monomolecular candidate ligands) were tested on VSNs. It is possible that only a very small number of these ligands activate the VNO. The list of variably expressed proteins - especially several proteins that are preferentially found in female urine - is compelling, but, again, there is no evidence presented that indicates whether or not these candidate ligands drive VSN activity. It is noteworthy that the largest class of known natural ligands for VSNs are small nonvolatiles that are found at high levels in mouse urine. These molecules were almost certainly involved in driving VSN activity in the physiology assays (both "generalist" and "specialist"), but they are absent from the molecular analysis.

      Reviewer 3 is right, of course, that at this point we have not tested the identified molecules on VSNs. This is clearly beyond the scope of the present study. We believe that the data we present will be the basis of (several full-length) future studies that aim to identify specific ligands and - best case scenario - receptor-ligand pairs. We find it hard to concur that our study, which provides the necessary basis for those future endeavors, is regarded as “incomplete”. By design, all studies are somewhat incomplete, i.e., there are always remaining questions and we are not contesting that.

      It is true, of course, that a class of “known natural ligands for VSNs are small nonvolatiles.” As we replied above, our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other non-volatile small organic molecules for three main reasons: (i) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      Reviewer #1 (Recommendations For The Authors):

      (1) I find that the study is highly valuable for researchers in this field. With the finding that wild mouse urines do not elicit significantly more variable responses from urines from inbred strains, researchers can now be reassured to use inbred strains to gain general insights on pheromone signaling.

      A major omission of this study is non-volatile small organic molecules such as steroids. These compounds are the only molecular class in urine that have been identified to stimulate specific vomeronasal receptors to date. It is unclear to me that the specificity of VOC and proteins can alone fully explain the response specificity of the VSNs that have been monitored in this study. The discussion of this topic is highly beneficial for the readers.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (2) How many different wild mouse urines were tested in this study? Is this sufficient to capture the diversity of wild M. musculus in local (Prague) habitats?

      We thank the reviewer for pointing this out. For the present study, 20 male (M) and 27 female (F) wild mice were caught at six different sites in the broader Prague area (i.e., Bohnice (50.13415N, 14.41421E; 2M+4F), Dolni Brezany (49.96321N, 14.4585E; 3M+4F), Hodkovice (49.97227N, 14.48039E; 5M+6F), Písnice (49.98988N, 14.46625E; 3M+6F), Lhota (49.95369N, 14.43087E; 1M+2F), and Zalepy (49.9532N, 14.40829E; 6M+5F). 18 of the 27 wild females were caught pregnant. The remaining 9 females were mated with males caught at the same site and produced offspring within a month. When selecting 10 male and 10 female individuals from first-generation offspring for urine collection, we ensured that all six capture sites were represented and that age-matched animals displayed similar weight (~17g). We believe that this capture / breeding strategy sufficiently represents “the diversity of wild M. musculus in local (Prague) habitats.” In the revised manuscript, we have now included these details in the Materials and Methods.

      (3) I found Figure 1e and figures in a similar format confusing - one panel describes the response statistics of VSNs, and other panels show the number of compounds found in different MS profiling, which is not immediately obvious from the figures. Is the y-axis legend correct (%)?

      We now try make the distinction between VSN “response statistics” and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts. Moreover, we thank the Reviewer for pointing out the mislabeling of the y-axis. Accordingly, we have deleted “%” in all corresponding figures.

      (4) For Figure 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.

      We thank Reviewer 1 for pointing this out. Importantly, we believe that this is already controlled for (see our response to the Public Review). In fact, for each experiment, we routinely prepare VNO slices along the entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we can thus exclude that the considerably different response profiles that we measured using different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also mention this point in the Results.

      Reviewer #2 (Recommendations For The Authors):

      (1) Pg 5 Lines 3-16: This summary paragraph contains too much detail given that the reader has not read the paper yet, which makes it bewildering. This should be condensed.

      We agree and have substantially condensed this paragraph.

      (2) Pg 6 Line 5-8: This summary of the experimental design is obtuse and should be edited for clarity.

      We have edited the relevant passage for clarity.

      (3) Pg 6 Line 11: "VSNs were categorized..." Specialist vs generalist is defined as responding to one or both stimuli. This definition is placed right after saying that the cells were also tested with KCl. The reader might think that specialist vs generalist was defined in relation to KCl.

      We have edited this sentence, which now reads: “Dependent on their individual urine response profiles, VSNs were categorized as either specialists (selective response to one stimulus) or generalists (responsive to both stimuli).”

      (4) Pg 6 Line 13: "we recorded urine-dependent Ca2+ signals from a total of 16,715 VSNs". Is a "signal" a response? Did all 16,715 VSNs respond to urine? What was the total of KCl responsive cells recorded?

      We edited the corresponding passage for clarification. The text now reads: “Overall, we recorded >43,000 K+-sensitive neurons, of which a total of 16,715 VSNs (38.4%) responded to urine stimulation. Of these urine-sensitive neurons, 61.4% displayed generalist profiles, whereas 38.6% were categorized as specialists (Figure 1c,d).”

      (5) Pg 7 Line 6: The repeated use of the word "pooled" is confusing as it suggests a variation in the experiment. The authors should establish once in the Methods and maybe in the Results that stimuli were pooled across animals. Then they should just refer to the stimulus as male or female or BALB/c rather than "pooled" male etc.

      We acknowledge the reviewer’s argument. Accordingly, we now introduce the experimental use of pooled urine once in the Methods and in the introductory paragraph of the Results. All other references to “pooled” urine in the Results and Captions have been deleted.

      (6) Pg 7 Line 10: "...detected in >=3 out of 10 male..." For the chemical analysis, were these samples not pooled?

      Correct. We deliberately did not pool samples for chemical analysis, but instead analyzed all individual samples separately (i.e., 60 samples were subjected to both proteomic and metabolomic analyses). Thus, the criterion that a VOC or protein must be detected in at least 3 of the 10 individual samples from a given sex/strain combination for a ‘present’ call (and in at least 6 of the 10 samples to be called ‘enriched’) ensures that the molecular signatures we identify are not “contaminated” by unusual aberrations within single samples.<br /> For clarification, we now explicitly outline this procedure in the Methods (Experimental Design and Statistical Analysis – Proteomics and metabolomics).

      (7) Pg 7 Line 23: In line 7, the specialist rate was defined as 5% in reference to the total KCl responsive cells. Here the specialist rate is defined from responsive cells. This is confusing.

      We apologize for the confusion. In both cases, the numbers (%) refer to all K+-sensitive neurons. We have added this information to both relevant sentences (l. 7 as well as ll. 23-24). Note that the rate in ll. 23-24 refers to generalists.

      (8) Pg 7 Line 25: Concentration index should be defined before its use here.

      We have revised the corresponding sentence, which now reads: “By contrast, analogously calculated concentration indices (see Materials and Methods) that can reflect potential disparities are distributed more broadly and non-normally (Figure 1h).”

      (9) Pg 7 Line 29: change "trivially" to "simply".

      Done

      (10) Pg 7 Line 30: What is meant by a "generalist" ligand? The neurons are generalists. Probably should read "common ligands"

      We have changed the text accordingly.

      (11) Pg 7 Line 31: What is meant by "global observed concentration disparities" ?

      We have changed the text to “…represented by the observed general concentration disparities.”

      (12) Pg 8 Lines 7-11: This section needs to be edited for clarity as it is very difficult to follow. For example, the definition of "enriched" is buried in a parenthetical. Also, it is very difficult to figure out what a "sample" is in this paper. Is it a pooled stimulus, or is it urine from an individual animal?

      We apologize for the confusion. Throughout the paper a “sample” is a pooled stimulus (from all 10 individuals of a given sex/strain combination) for all physiological experiments. For chemical analysis a “sample” refers to urine from an individual animal.

      (13)Pg 8 Line 11: "abundant proteins" Does this mean absolute concentration or enriched in one sample vs another?

      We changed the term “abundant” to “enriched” as this descriptor has been defined (present in ≥6 of 10 individual samples) in the previous sentence.

      (14) Pg 8 Line 18: "While 32.9% of all..." Please edit for clarity. What is the point?

      The main point here is that, for VOCs, the vast majority of compounds (91.3%) are either generic mouse urinary molecules or are sex/strain-specific.

      (15) Pg 10 Line 18: "Increased VSN selectivity..." This title is misleading as it suggests a change in sensitivity with animal exposure. I think the authors are trying to say "VSNs are more selective for strain than for sex". The authors should avoid the term "exposure to" when they mean "stimulation with" as the former suggests chronic exposure prior to testing.

      We thank the reviewer for the advice and have changed the title accordingly. We also edited the text to avoid the term "exposure to" throughout the manuscript.

      (16) Pg 12 Line 10: "we recorded hardly any..." Hardly any in comparison to what? BALB/c?

      We apologize for the confusion. We have edited the text for clarity, which now reads: “In fact, (i) compared to an average specialist rate of 11.2% ± 6.6% (mean ± SD) calculated over all 13 binary stimulus pairs (n = 26 specialist types), we observed only few specialist responses upon stimulation with urine from wild females (2% and 3%, respectively), and…”

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to the pairwise stimulus-response experimental design and analysis: there is precedent in the field for studies that explore the same topic (sex- and strain-selectivity), but measure VSN sensitivity across many urine stimuli, not just two at a time. This has been done both in the VNO (He et al, Science, 2008; Fu, et al, Cell, 2015) and in the AOB (Tolokh, et al, Journal of Neuroscience, 2013). The current manuscript does not cite these studies.

      Reviewer 3 is correct and we apologize for this oversight. We now cite the two VSN-related studies by He et al. and Fu et al. in the Introduction.

      (2) The findings of the mass spectrometry-based profiling of mouse urine - especially for volatiles - is only accessible through repositories, making it difficult to for readers to understand which molecules were found to be highly divergent between sexes/strains. There is value in the list of ligands to further investigate, but this information should be made more accessible to readers without having to comb through the repositories.

      We agree that there “is value in the list of ligands to further investigate” and, accordingly, we now provide a table (Table 1) that lists the top-5 VOCs that – according to sPLS-DA – display the most discriminative power to classify samples by sex (related to Figure 2c) or strain (related to Figure 2d). For ease of identification, all entries list internal mass spectrometry identifiers, identifiers extracted from MS analysis database, the sex or strain that drives separation, which two-dimensional component / x-variate represents the most discriminative variable, PubChem chemical formula, PubChem common or alternative names, Chemical Entities of Biological Interest or PubChem Compound Identification, and the VOC’s putative origin.

      (3) There is a long precedent for integrating molecular assessments and physiological recordings to identify specific ligands for the vomeronasal system: - nonvolatiles (e.g., Leinders-Zufall, et al., Nature, 2000)

      • peptides (e.g., Kimoto et al., Nature, 2005; Leinders-Zufall et al. Science, 2004; Riviere et al., Nature, 2009; Liberles, et al., PNAS, 2009)
      • proteins (e.g., Chamero et al., Nature, 2007; Roberts et al., BMC Biology, 2010)

      • excreted steroids and bile acids (Nodari et al., Journal of Neuroscience, 2008; Fu et al., Cell, 2015; Doyle, et al., Nature Communications, 2016)

      The Leinders-Zufall (2000), Roberts, and Nodari papers are referenced, but the broader efforts by the community to find specific drivers of vomeronasal activity are not fully represented in the manuscript. The focus of this paper is fully related to this broader effort, and it would be appropriate for this work to be placed in this context in the introduction and discussion.

      We now refer to all of the studies mentioned in the Introduction (except the article published by Liberles et al. in 2009, since the authors of that study do not identify vomeronasal ligands).

      (4) Throughout the manuscript (starting in Fig. 1h) the figure panels and captions use the term "response index" whereas the methods define a "preference index." It seems to be the case that these two terms are synonymous. If so, a single term should be consistently used. If not, this needs to be clarified.

      We now consistently use the term “response index” throughout the manuscript.

      (5) It would be useful to provide a table associated with Figure 2 - figure supplement 1 that lists the common names and/or chemical formulas for the volatiles that were found to be of high importance.

      We agree and, accordingly, we now provide a table (Table 2) that lists VOC, which – according to Random Forest classification and resulting Gini importance scores – display the most discriminative power to classify samples by sex (related to Figure 2 - figure supplement 1a) or strain (related to Figure 2 - figure supplement 1b). Notably, it is generally reassuring that several VOCs are listed in both Table 1 and Table 2, emphasizing that two different supervised machine learning algorithms (i.e., sPLS-DA (Table 1) and Random Forest (Table 2)) yield largely congruent results.

      (5) The use of the term "comprehensive" for the molecular analysis is a little bit misleading, as volatiles and proteins are just two of the many categories of molecules present in mouse urine.

      We have now deleted most mentions of the term "comprehensive" when referring to the molecular analysis.

      (7) Page 11, lines 24-27: The sentences starting "We conclude..." and ending in "semiochemical concentrations." These two sentences do not make sense. It is not known how many of the identified proteins are actual VSN ligands. Moreover, there is abundant evidence from other studies that individual VSN activity provides information about distinct semiochemical concentrations.

      We have substantially edited and rephrased this paragraph to better reflect that different scenarios / interpretations are possible. The relevant text now reads: “We conclude that VSN population response strength might not be so strongly affected by strain-dependent concentration differences among common urinary proteins. In that case, it would appear somewhat unlikely that individual VSN activity provides fine-tuned information about distinct semiochemical concentrations. Alternatively, as some (or even many) of the identified proteins could not serve as vomeronasal ligands at all, generalist VSNs might sample information from only a subset of compounds which, in fact, are secreted at roughly similar concentrations.”

      (8) The explanation of stimulus timing is mentioned several times but not defined clearly in methods. Page 19, lines 14-19 have information about the stimulus delivery device, but it would be helpful to have stimulus timing explicitly stated.

      In addition to the relevant captions, we now explicitly state stimulus timing (i.e., 10 s stimulations at 180 s inter-stimulus intervals) in the Results.

      (9) Typos: Page 10, line 7: "male biased" → "male-biased" for clarity

      Wilcoxon "signed-rank" test is often misspelled "Wilcoxon singed ranked test" or "Wilcoxon signed ranked test"

      In the Fig. 3 legend, the asterisk meaning is unspecified.

      "(im)balances" → imbalances (page 27, line 24; page 37, line 16; page 38, line 16)

      Figure 2 - figure supplement 1 and in Figure 2 - figure supplement 2, in the box-andwhisker plots the units are not specified in the graph or legend.”

      We have made all required corrections.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study utilizes a virus-mediated short hairpin RNA (shRNA) approach to investigate in a novel way the role of the wild-type PHOX2B transcription factor in critical chemosensory neurons in the brainstem retrotrapezoid nucleus (RTN) region for maintaining normal CO2 chemoreflex control of breathing in adult rats. The solid results presented show blunted ventilation during elevated inhaled CO2 (hypercapnia) with knockdown of PHOX2B, accompanied by a reduction in expression of Gpr4 and Task2 mRNA for the proposed RTN neuron proton sensor proteins GPR4 and TASK2. These results suggest that maintained expression of wild-type PHOX2B affects respiratory control in adult animals, which complements previous studies showing that PHOX2B-expressing RTN neurons may be critical for chemosensory control throughout the lifespan and with implications for neurological disorders involving the RTN. When some methodological, data interpretation, and prior literature reference issues further highlighting novelty are adequately addressed, this study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of the PHOX2B transcription factor in neurons in the key brainstem chemosensory structure, the retrotrapezoid nucleus (RTN), for maintaining proper CO2 chemoreflex responses of breathing in the adult rat in vivo. PHOX2B has an important transcriptional role in neuronal survival and/or function, and mutations of PHOX2B severely impair the development and function of the autonomic nervous system and RTN, resulting in the developmental genetic disease congenital central hypoventilation syndrome (CCHS) in neonates, where the RTN may not form and is functionally impaired. The function of the wild-type PHOX2B protein in adult RTN neurons that continue to express PHOX2B is not fully understood. By utilizing a viral PHOX2B-shRNA approach for knockdown of PHOX2B specifically in RTN neurons, the authors' solid results show impaired ventilatory responses to elevated inspired CO2, measured by whole-body plethysmography in freely behaving adult rats, that develop progressively over a four-week period in vivo, indicating effects on RTN neuron transcriptional activity and associated blunting of the CO2 ventilatory response. The RTN neuronal mRNA expression data presented suggests the impaired hypercapnic ventilatory response is possibly due to the decreased expression of key proton sensors in the RTN. This study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Strengths:

      (1) The authors used a shRNA viral approach to progressively knock down the PHOX2B protein, specifically in RTN neurons to determine whether PHOX2B is necessary for the survival and/or chemosensory function of adult RTN neurons in vivo.

      (2) To determine the extent of PHOX2B knockdown in RTN neurons, the authors combined RNAScope® and immunohistochemistry assays to quantify the subpopulation of RTN neurons expressing PHOX2B and neuromedin B (Nmb), which has been proposed to be key chemosensory neurons in the RTN.

      (3) The authors demonstrate that knockdown efficiency is time-dependent, with a progressive decrease in the number of Nmb-expressing RTN neurons that co-express PHOX2B over a four-week period.

      (4) Their results convincingly show hypoventilation particularly in 7.2% CO2 only for PHOX2B-shRNA RTN-injected rats after four weeks as compared to naïve and non-PHOX2B-shRNA targeted (NT-shRNA) RTN injected rats, suggesting a specific impairment of chemosensitive properties in RTN neurons with PHOX2B knockdown.

      (5) Analysis of the association between PHOX2B knockdown in RTN neurons and the attenuation of the hypercapnic ventilatory response (HCVR), by evaluating the correlation between the number of Nmb+/PHOX2B+ or Nmb+/PHOX2B- cells in the RTN and the resulting HCVR, showed a significant correlation between HCVR and number of Nmb+/PHOX2B+ and Nmb+/PHOX2B- cells, suggesting that the number of PHOX2B-expressing cells in the RTN is a predictor of the chemoreflex response and the reduction of PHOX2B protein impairs the CO2-chemoreflex.

      (6) The data presented indicate that PHOX2B knockdown not only causes a reduction in the HCVR but also a reduction in the expression of Gpr4 and Task2 mRNAs, suggesting that PHOX2B knockdown affects RTN neurons transcriptional activity and decreases the CO2 response, possibly by reducing the expression of key proton sensors in the RTN.

      (7) Results of this study show that independent of the role of PHOX2B during development, PHOX2B is still required to maintain proper CO2 chemoreflex responses in the adult brain, and its reduction in CCHS may contribute to the respiratory impairment in this disorder.

      Weaknesses:

      (1) The authors found a significant decrease in the total number of Nmb+ RTN neurons (i.e., Nmb+/PHOX2B+ plus Nmb+/ PHOX2B-) in NT-shRNA rats at two weeks post viral injection, and also at the four-week period where the impairment of the chemosensory function of the RTN became significant, suggesting some inherent cell death possibly due to off-target toxic effects associated with shRNA procedures that may affect the experimental results.

      (2) The tissue sampling procedures for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally have not been completely validated to accurately represent the full expression patterns in the RTN under experimental conditions.

      (3) The inferences about RTN neuronal expression of NMB, GPR4, or TASK2 are based on changes in mRNA levels, so it remains speculation that the observed reduction in Gpr4 and Task2 mRNA translates to a reduction in the protein levels and associated reduction of RTN neuronal chemosensitive properties.

      Thank you for sharing the excitement for our study showing novel findings on the contribution of PHOX2B to the chemoreflex response and activity of adult RTN neurons. We believe that reporting the results on cell death following shRNA viral injections, potentially due to some off-target effects, are important to share with the scientific community to help plan experiments of similar kind in various fields of neuroscience.

      Thanks for pointing out your concerns about cell quantification, we have edited the methods and results section to add clarity about our analytical procedure.

      As we discussed in the manuscript, we were only able to assess mRNA levels of Nmb, Gpr4, Task2 as current available antibodies for the 3 targets are still unreliable. Future studies will benefit from the analysis of changes at protein levels and possibly electrophysiological recordings to verify that chemosensitive properties of RTN neurons are impaired due to reduction of PHOX2B expression. We discuss these limitations in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a short hairpin RNA technique strategy to elucidate the functional activity of neurons in the retrotrapezoid nucleus (RTN), a critical brainstem region for central chemoreception. Dysfunction in this area is associated with the neuropathology of congenital central hypoventilation syndrome (CCHS). The subsequent examination of these rats aimed to shed light on the intricate aspects of RTN and its implications for central chemoreception and disorders like CCHS in adults. They found that using the short hairpin RNA (shRNA) targeting Phox2b mRNA, a reduction of Phox2b expression was observed in Nmb neurons. In addition, Phox2b knockdown did not affect breathing in room air or under hypoxia, but the hypercapnia ventilatory response was significantly impaired. They concluded that Phox2b in the adult brain has an important role in CO2 chemoreception. They thought that their findings provided new evidence for mechanisms related to CCHS neuropathology. The conclusions of this paper are well supported by data, but careful discussion seems to be required for comparison with the results of various previous studies performed by different genetic strategies for the RTN neurons.

      Strengths:

      The most exciting aspect of this work is the modelling of the Phox2b knockdown in one element of the central neuronal circuit mediating respiratory reflexes, that is in the RTN. To date, mutations in the PHOX2B gene are commonly associated with most patients diagnosed with CCHS, a disease characterized by hypoventilation and absence of chemoreflexes, in the neonatal period, which in severe cases can lead to respiratory arrest during sleep. In the present study, the authors demonstrated that the role of Phox2b extends beyond the developmental period, and its reduction in CCHS may contribute to the respiratory impairment observed in this disorder.

      Weaknesses:

      Whereas the most exciting part of this work is the knockdown of the Phox2b in the RTN in adult rodents, the weakness of this study is the lack of a clear physiological, developmental, and anatomical distinction between this approach and similar studies already reported elsewhere (Ruffault et al., 2015, DOI: 10.7554/eLife.07051; Ramanantsoa et al., 2011, DOI: 10.1523/JNEUROSCI.1721-11.2011; Huang et al., 2017, DOI: 10.1016/j.neuron.2012.06.027; Hernandez-Miranda et al., 2018, DOI: 10.1073/pnas.1813520115; Ferreira et al., 2022 DOI: 10.7554/eLife.73130; Takakura et al., 2008 DOI: 10.1113/jphysiol.2008.153163; Basting et al., 2015 DOI: 10.1523/JNEUROSCI.2923-14.2015; Marina et al., 2010 DOI: 10.1523/JNEUROSCI.3141-10.2010). In addition, several conclusions presented in this work are not directly supported by the provided data.

      Thanks for the feedback on or manuscript. We have further highlighted in our discussion the previous developmental work aimed at determining the role of PHOX2B in embryonic development. Our study was triggered by the fascinating observations that despite its important role in development of the central and peripheral nervous system, PHOX2B is still present in the adult brain and its function in adult neurons is unknown, thus we aimed to investigate its role in the adult RTN by knocking down its expression with a shRNA approach. Therefore, in our model knockdown of PHOX2B does not affect development of the RTN. Previous studies (mentioned by the reviewer, as well as cited in the manuscript) have focused on investigating 1) the role of PHOX2B in the developmental period, 2) the physiological changes associated with the transgenic expression of mutant forms of PHOX2B in relation to CCHS, 3) the killing or the acute silencing/excitation of neuronal activity of PHOX2B+ RTN neurons. Our study had a different aim: to test whether the transcription factor PHOX2B had a physiologically relevant role in adult RTN neurons. In this experimental approach PHOX2B is not altered throughout embryonic or postnatal development. By knocking down PHOX2B in the Nmb+ cells of the RTN our results show a reduction in chemoreflex response and mRNA expression of protein sensors. Hence, we conclude that PHOX2B alters the function of Nmb+ RTN neurons, possibly through transcriptional changes including the reduction in Gpr4 and Task2 mRNA expression.

      Reviewer #3 (Public Review):

      A brain region called the retrotrapezoid nucleus (RTN) regulates breathing in response to changes in CO2/H+, a process termed central chemoreception. A transcription factor called PHOX2B is important for RTN development and mutations in the PHOX2B gene result in a severe type of sleep apnea called Congenital Central Hypoventilation Syndrome. PHOX2B is also expressed throughout life, but its postmitotic functions remain unknown. This study shows that knockdown of PHOX2B in the RTN region in adult rats decreased expression of Task2 and Gpr4 in Nmb-expressing RTN chemoreceptors and this corresponded with a diminished ventilatory response to CO2 but did not impact baseline breathing or the hypoxic ventilatory response. These results provide novel insight regarding the postmitotic functions of PHOX2B in RTN neurons.

      Main issues:

      (1) The experimental approach was not targeted to Nmb+ neurons and since other cells in the area also express Phox2b, conclusions should be tempered to focus on Phox2b expressing parafacial neurons NOT specifically RTN neurons.

      (2) It is not clear whether PHOX2B is important for the transcription of pH sensing machinery, cell health, or both. If knockdown of PHOX2B knockdown results in loss of RTN neurons this is also expected to decrease Task2 and Gpr4 levels, albeit by a transcription-independent mechanism.

      Although we did not specifically target Nmb+ neurons, we performed viral injections within the area where neurons expressing PHOX2B and Nmb are localized (i.e., the RTN region). We carefully quantified the impact of PHOX2B knockdown on Nmb expressing neurons, as well as the effects on the adjacent TH expressing C1 population and FN neurons (figure 5). As reported in the results section, significant changes in the numbers of PHOX2B expressing neurons was only observed at the site of injection in PHOX2B+/Nmb+ neurons. We did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells coexpressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      Where appropriate, we have substituted “RTN” with “Nmb expressing neurons of the RTN” throughout the manuscript.

      We have clarified in the methods and results section how we quantified Task2 and Gpr4 mRNA expression. The quantification was performed on a pool of single cells (200-250/rat) expressing Nmb. Hence, the overall reduction is not a result of general fluorescence loss in the RTN region, but specifically assessed in single cells expressing Nmb. This is therefore independent of the reduction of the total number of Nmb cells.

      We propose that cell death is not a direct effect of PHOX2B knockdown, but rather it is associated with the injection of the viral constructs that have been already reported to promote some off-target effects (as reported in the manuscript). While modest cell death is observed only in the first two weeks post-infection, it does not increase further between 2 and 4 weeks post infection, when the reduction in PHOX2B (not associated with a further reduction in Nmb+ cells, hence no further cell death in RTN) is evident and the respiratory chemoreflex is impaired. These results suggest that 1) reduction of PHOX2B is not responsible for cell death; 2) it is the reduction of PHOX2B levels that promotes chemoreflex impairment. Given the observation that Nmb cells with no detectable PHOX2B protein show reduced expression of Task2 and Gpr4 mRNA, we propose that one of the possible mechanisms of chemoreflex impairment in PHOX2B shRNA rats is the reduction of Task2 and Gpr4. In the discussion we also suggest possible additional mechanisms that can be investigated in further studies.

      Recommendations for the authors:.

      In revising this manuscript, the authors should carefully address the issues raised by the reviewers to substantially improve the manuscript and solidify the reviewers' general assessment of the potential importance of this work.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The cell counts for Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons are a critical component of the study, and it is unclear how the tissue sampling procedures (eight sections per animal) for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally has been validated to accurately represent the full expression patterns in the RTN under the experimental conditions. It is possible that the sampling/quantification procedures used may be adequate, but validation is important. Also, quantification of the CTCF signal for Nmb, Gpr4, and Task2 mRNA is an important component of this study, but only four sections/rats were used.

      Thank you for pointing out your concern on our quantification method. We have clarified in the methods section the procedure for cell counting and quantification of the CTCF signal. We have sampled the area of the RTN in order to identify Nmb cells of RTN.

      We have edited the methods section as follows:

      “To quantify Nmb+/PHOX2B- and Nmb+/PHOX2B+ neurons within the RTN region, we analysed one in every seven sections (210 µm interval; 8 sections/rat in total) along the rostrocaudal distribution of the RTN on the ventral surface of the brainstem and compared total bilateral cell counts of PHOX2B-shRNA rats with non-target control (NT-shRNA) and naïve rats. Cells that expressed Nmb and Phox2b mRNAs but did not show co-localization with PHOX2B protein were considered Nmb+/PHOX2B-.

      The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared, the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS (4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown.

      To quantify Gpr4 and Task2 mRNA expression in Nmb cells of the RTN, we first quantified single cell CTCF for either Gpr4 (200.7 ± 13.2 cells/animal) or Task2 (169.6 ± 10.3 cells/animal) mRNA in Nmb+ RTN neurons in the 3 experimental groups (naïve, NT shRNA and PHOX2B shRNA) independent of their PHOX2B expression. We then compared CTCF values of Gpr4 and Task2 mRNA between Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons in PHOX2B-shRNA rats to address changes in their mRNA expression induced by PHOX2B knockdown.

      (2) Furthermore, to evaluate changes in Nmb mRNA expression following PHOX2B knockdown at the level of the RTN, it is stated in Materials and Methods "we compared, on the same tissue section, the fluorescence intensity of RTN Nmb+ cells with the signal of Nmb+ cells in the NTS (Nmb CTCF ratio RTN/NTS)". How this was accomplished is unclear, considering the non-overlapping locations of the RTN and rostral NTS. Providing images would be helpful.

      The first sections containing Nmb cells in the ventral medulla also express few Nmb cells in the dorsal medulla. We used those cells as reference for fluorescence levels since they would not be affected by the viral infection. Similar cells are also present in the brains of mice and reported in the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal regions in Figure 5.

      (3) The staining for tyrosine hydroxylase (TH) to identify and quantify C1 cells (TH+/PHOX2B+) following shRNA injection provides important information, and it would be useful to show images of histological examples to accompany Fig. 5A.

      We included in figure 5A a sample image of C1 neurons used for our TH quantification.

      Minor:

      (1) Provide animal ns in the text of the Results section for the four weeks of PHOX2B knockdown.

      They have been included.

      (2) Please state in the legends for Figures 2 & 3, which images are superimposition images.

      We have in the figure information about merged images.

      Reviewer #2 (Recommendations For The Authors):

      This manuscript by Cardani and colleagues attempts to address whether a reduction in Phox2b expression in chemosensitive neuromedin-B (NMB)-expressing neurons in the RTN alters respiratory function. The authors used a short hairpin RNA technique to silence RTN chemosensor neurons. The present study is very interesting, but there are several major concerns that need to be addressed, including the main hypothesis.

      Major

      (1) Page 6, lines 119-121: I did not grasp the mechanistic property described by the authors in this passage, nor did I understand the experiments they conducted to establish a mechanistic link between Phox2b and the chemosensitive property. Could the authors provide further clarification on these points?

      We believe the reviewer refers to this paragraph: “In order to have a better understanding of the role of PHOX2B in the CO2 homeostatic processes we used a non-replicating lentivirus vector of two short-hairpin RNA (shRNA) clones targeting selectively Phox2b mRNA to knockdown the expression of PHOX2B in the RTN of adult rats and tested ventilation and chemoreflex responses. In parallel, we also determined whether knockdown of PHOX2B in adult RTN neurons negatively affected cell survival. Finally, we sought to provide a mechanistic link between PHOX2B expression and the chemosensitive properties of RTN neurons, which have been attributed to two proton sensors, the proton-activated G protein-coupled receptor (GPR4) and the proton-modulated potassium channel (TASK-2).”

      The rationale for running these experiments is based on the fact that it is well known in the literature that PHOX2B is an important transcription factor for the development of several neuronal populations. PHOX2B Knockout mice die before birth and heterozygous mice have some anatomical defects, but respiration is only impaired in the early post-natal period. While many developmental transcription factors are generally downregulated in the post-natal period, PHOX2B is still expressed in some neurons into adulthood. What is the function of PHOX2B in these fully developed neurons? We do not know as we do not yet know the entire set of target genes that PHOX2B regulates in the adult brain. Hence we decided to test what would happen if we knocked down the PHOX2B protein in the Nmb neurons of the RTN, an area that is critical for central chemoreception and involved in the presentation of CCHS. Our results show that reduction of PHOX2B blunts the CO2 chemoreflex response and reduces mRNA expression of Task2 and Gpr4, two pH sensors that have been shown to be key for RTN chemosensitive properties. We also show that the Nmb mRNA and cell survival are not affected by PHOX2B knockdown and we propose that the reduced CO2 chemoreflex may be attributed to a reduction of chemosensory function of Nmb neurons of the RTN due to partial loss of Gpr4 and Task2.

      (2) It is imperative for the authors to enhance the description of their hypothesis, as, from my perspective, the contribution of the data to the field is not clearly articulated. Numerous more selectively designed experiments were conducted to investigate the role of Phoxb-expressing neurons at the RTN level and their involvement in respiratory activity. In summary, the current study appears to lack novelty.

      We respectfully disagree with this statement. We believe we have adequately summarized previous work, although we realize we can’t reference every single publication on this topic. As described above, the developmental role of PHOX2B has been elegantly investigated in mouse embryonic studies (extensively cited in the manuscript). Furthermore, very interesting studies have shown that when the CCHS defining mutant PHOX2B protein (+7Ala PHOX2B) and other mutations linked to CCHS have been transgenically expressed in mice through development, severe anatomical defects are observed and respiratory function is impaired (extensively cited in the manuscript). We have also cited papers relevant to this study that describe the role of PHOX2B/Nmb RTN neurons and the pH protein sensors in the CO2 chemoreflex. If we missed some papers that the reviewer deems essential in the context of this study we will be happy to include them.

      We are not aware of other studies that have investigated the specific role of the PHOX2B protein in the adult RTN in the absence of confounding developmental pathogenesis (i.e. in an otherwise ‘healthy’ animal), and of no other studies that looked at the effects on the RTN proton sensors and Nmb expression following PHOX2B knockdown. Hence we believe that our results are novel and, in our opinion, very interesting.

      (3) On pages 13 and 14 (Results section), I am seeking clarity on the novelty of the findings. Doug Bayliss's prior work has already demonstrated the role of Gpr4 and Task2 on Phox2b neurons in regulating ventilation in conscious rodents.

      Bayliss’ group has elegantly demonstrated that Gpr4 and Task2 are the two proton sensors in the PHOX2B/Nmb neurons of the RTN that have a key role in chemoreception (cited in the manuscript). The novelty of our findings is that we show that a reduction in PHOX2B protein is associated with a reduction of mRNA levels of Gpr4 and Task2. This is a novel finding. Currently, we do not know what transcriptional activity PHOX2B has in adult RTN neurons (i.e., what gene targets PHOX2B has in this cell population and many others) and here we propose that Nmb is not a gene target of PHOX2B while Gpr4 and Task2 are.

      (4) The authors assert that the transcription factor Phox2b remains not fully understood. While I concur, the present study falls short of fully investigating the actual contribution of Phox2b to breathing regulation. In other words, the knockdown of Phox2b neurons did not add much to the knowledge of the field.

      We respectfully disagree with the reviewer. With the exception of very few target genes, the transcriptional role of PHOX2B beyond the embryonic development is poorly understood. No mechanistic connection has been made before between the transcriptional activity of PHOX2B with the expression of proton sensors in the RTN. Other groups have investigated the role of stimulating or depressing the neuronal activity of PHOX2B/NMB neurons in the RTN showing a key role of RTN on respiratory control, but these prior studies did not test whether changing the expression of the PHOX2B protein in these neurons had a role on respiratory control and the central chemoreflex. No other study has investigated the role of the PHOX2B protein within the RTN cells, with the exception of PHOX2B knockout mice or transgenic expression of the mutated PHOX2B that are relevant for CCHS. Again, these previous studies were done on a background of developmental impairment and to the best of our knowledge did not seek to show any association between PHOX2B expression and expression of Gpr4 or Task2.

      (5) I recommend removing the entire section entitled "The role of Phox2b in development and in the adult brain." The authors merely describe Phox2b expression without contextualizing it within the obtained data.

      Because reviewers raised the issue about not including important information about the role of PHOX2B in development and respiratory control we prefer to keep the section.

      (6) Are the authors aware of whether the shRNA in Phox2b/Nmb neurons truly induced cell death or solely depleted the expression of the transcription factor protein? Do the chemosensitive neurons persist?

      This is an excellent question that we tried to address with our study. As we report in figures 2 and 3, we propose that some cell death is occurring as an off-target effect within the first 2 weeks post-infection, likely due to off-target action of the shRNA approach and not dependent on the reduction of PHOX2B expression (discussed in the manuscript). This is further evidenced by our Fig.S1 data in which higher concentrations of shRNA led to more cell death, indicative of off-target effects. We do not believe it is a consequence of our surgical procedure as we do not see similar cell loss when injecting vehicle or other control solutions (unpublished work; Janes et al., 2024).

      During the first 2 weeks post-surgery the proportion of Nmb+/PHOX2B- cells does not change compared to control rats or non-target shRNA (knockdown is not yet visible at protein level). Four weeks post-injection, there is no further cell death (assessed by the total number of NMB cells), whereas the fraction of NMB cells that express PHOX2B is reduced (and the fraction of NMB not expressing PHOX2B is increased), suggesting that the reduction of PHOX2B protein in Nmb cells is not correlated with cell loss/survival whereas the impairment that we observe in terms of central chemoreception is possibly due to the progressive decrease of PHOX2B expression in these neurons.

      (7) In Figures 2 and 3, it is noteworthy that the authors observe peak expression at a very caudal level. In rats, the RTN initiates at the caudal end of the facial, approximately 11.6 mm, and should exhibit a rostral direction of about 2 mm.

      In our experience the Nmb cells on the ventral surface of the medulla peak in number around the caudal tip of the facial nucleus in adult SD rats (Janes et al., 2024). To add clarity to the figure we reported cell count distribution data in relation to the distance from caudal tip of the facial.

      Minor

      (1) I would like to suggest that the authors correct the recurring statement throughout the manuscript that Phox2b is essential only for the development of the autonomic nervous system. In my view, it also plays a crucial role in certain sensory and respiratory systems.

      We have addressed this in the manuscript.

      (2) Page 4, lines 59-60: Out of curiosity, do the data include information from different countries?

      This data refers to information from France and Japan. Currently it is estimated that there are 1000-2000 CCHS patients worldwide.

      (3) Page 7, lines 129-131: In my understanding, the sentence is quite clear; if we knock down the PHOX2B gene, we are expected to reduce or even eliminate the expression of Gpr4 or Task2. Am I right?

      This is what we propose from the results of this study. We would like to point out that the transcriptional activity of PHOX2B (i.e., what genes PHOX2B regulate) in adult neurons has not yet been fully investigated. With the exception of few target genes (e.g., TH, DBH) the transcriptional activity of PHOX2B in neurons is not yet known. Here we report novel findings that suggest that Gpr4 and Task2 are potential target genes of PHOX2B in RTN neurons.

      (4) The authors mentioned that NT-shRNA also impacts CO2 chemosensitivity. Could this effect be attributed to mechanical damage of the tissue resulting from the injection?

      Just to clarify, we observe some impairment in chemosensitivity when NT-shRNA was injected in “larger” (2x 200ul/side) volume. No impairment was observed in NT-shRNA when we injected smaller volumes (2x 100ul/side). Physical damage could be a possibility although in our experience (unpublished work; Janes et al, 2024, Acta Physiologica) injections of similar volume of solution performed by the same investigator in the same brain area and experimental settings did not produce a physical lesion associated with respiratory impairment. Hence we attribute the unexpected results with larger volumes to toxic effects associated with the shRNA viral constructs.

      (5) In the reference section, the authors should review and correct some entries. For instance, Janes, T. A., Cardani, S., Saini, J. K., & Pagliardini, S. (2024). Title: "Etonogestrel Promotes Respiratory Recovery in an In Vivo Rat Model of Central Chemoreflex Impairment." Running title: "Chemoreflex Recovery by Etonogestrel." Some references contain the journal, pages, and volume, while others lack this information entirely.

      We have updated references. Janes et al., 2024 has now been published in Acta Physiologica.

      (6) Why does the baseline have distribution points, whereas the other boxplots do not?

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      Reviewer #3 (Recommendations For The Authors):

      • What is the rationale behind dedicating the first paragraph of results to discussing an artifact?

      We think that it is important to report off target effects of shRNA viral constructs as concentration and volumes of viruses injected in various studies vary considerably and other investigators may attempt to use larger volumes of viruses to obtain more considerable or faster knockdown but would obtain erroneous conclusions if appropriate tests are not performed.

      Furthermore, because some readers could question whether we injected enough virus to knockdown the expression of PHOX2B, and may wonder if with a larger amount of virus we would increase knockdown efficiency, we wanted to show that, in our opinion, we used the maximum amount of virus to knockdown PHOX2B without causing toxic effects or physiological changes that are not dependent on PHOX2B knockdown.

      • All individual data points should be visible in floating bar graphs in Figures 1 and 4. For example, I don't see any dots for naïve animals in any of the panels in Figure 1.

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      • Please include specific F and T values along with DF.

      We have included a table with all the specific values in the supplementary section as Table 1.

      • The C1 and facial partly overlap with the RTN at this level of the medulla and these cells should appear as Phox2b+/Nmb- cells so it is not clear to me why these cells are not evident in the control tissue in Figures 2B and 3B. Also, some of the bregma levels shown in Figure 5A overlap with Figures 2-3 so again it is not clear to me how this non-cell type specific viral approach was targeted to Nmb cells but not nearby TH+ cells. Please clarify.

      In our experience, C1 TH cells are located slightly medial to the Nmb cells and they spread much more caudally than Nmb cells of the RTN. We focused our small volume injection in the core of the RTN to target Nmb cells but we also assessed PHOX2B knockdown in TH C1 cells by counting the PHOX2B/TH cells across treatment groups. Although we can’t exclude subtle changes in the C1 population, we did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells expressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      • To confirm, Nmb is not expressed in the NTS, and this region was chosen as a background, right?

      In order to systematically analyze Nmb mRNA expression we decided to use measurement of fluorescence relative to Nmb neurons present in the dorsal brainstem. Here cells are sparse but we used them as reference fluorescence since they would not be affected by the ventral shRNA injection. Similar cells are also present in the brains of mice and reported by the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal in Figure 5.

      • How do you get a loss of Nmb+ neurons (Figs 2-3) with no change in Nmb fluorescence (Fig. 5B)? In the absence of representative images these results are not compelling and should be substantiated by more readily quantifiable approaches like qPCR.

      We have clarified in the methods and results section our analytical procedure to assess PHOX2B and Nmb expression. Figure 2 and 3 display the results of counting numbers of Nmb+ cells in the RTN. Figure 5B reports the average of total cell fluorescence measured inside Nmb+ cells, not an average fluorescence measurement of the area of the ventral medulla. Basically, our results show that we have less Nmb cells that express PHOX2B but the overall Nmb mRNA fluorescence (expression) in Nmb cells relative to Nmb fluorescence in cells of the dorsal brainstem is the same.

      We have edited the methods as follows:

      “The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS ( 4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown. “

      A single cell qPCR analysis would be definitely ideal but a qPCR from dissected tissue would not help us determine whether within a cell there was a reduction in Nmb mRNA levels.

      • The boxed RTN region in these examples is all over the place. It the RTN should be consistently placed along the ventral surface under the facial and pprox.. equal distance from the trigeminal and pyramids.

      We have update the figures to consistently present the areas of interest where Nmb cells are located and images are taken.

      • Fluorescent in situ typically appears as discrete puncta so it is not clear to me why that is not the case here.

      Our images are taken at low magnification (20X) where it is difficult to distinguish the single mRNA molecules. However, is it possible to appreciate the differences between the grainy fluorescent signal in the in situ hybridization assay (RNAScope) and the smoother signal of protein detection in the immunofluorescence assay.

      • Can TUNEL staining be done to confirm loss of Nmb neurons is due to death and not re-localization?

      Does the reviewer mean “cell migration” with relocalization? We do not expect that this would occur in our experiments. Although TUNEL in the first week post-infection could be useful to determine cell death in our tissue, we do not expect a cell migration of neurons within the brain as our viral shRNA injections are performed in adult rats when developmental processes are already concluded.

    1. Author response:

      We sincerely thank the editors and reviewers for the rigorous evaluation of our work and the precious time invested. The positive comments resonate with our endeavor to explore the intrinsic role of astrocyte aquaporin in brain water homeostasis. Meanwhile, we very appreciate the constructive suggestions of the reviewers to consolidate this study. Here is the provisional response, which briefly outlines our acknowledgement of the reviewers’ suggestions:

      To Reviewer #1:

      • Imaging data will be examined and collected to determine whether AQP4 inhibition has differential effects on astrocyte calcium signals in terms of cellular locations.

      • New analysis will be performed for CSD swelling data to provide additional kinetic information.

      • The mentioned original papers are important, and will be included in the revision.

      To Reviewer #2:

      We agree, a careful revision will improve and better position the study.

      • Echoing Reviewer #1, the introduction and discussion will be strengthened with current scientific contexts, while paying attention to the important advances in glymphatic system. The limits of the study mentioned in the reviews will be stated.

      • The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies. AER-270(271) was nicely introduced by Farr et al., 2019 (PMID: 30738082). Its validation in vivo in AQP4 KO mice, and the comparison to TGN-020, is reported in a very recent study (Giannetto et al., 2024 - PMID: 38363040) that provides valuable insights.

      • The description of specific methodologies, including the DW-MRI, will be reinforced. The presentation of experiments and statistical analysis will be refined.

      To Reviewer #3:

      • Solenov et al., 2004 (PMID: 14576087) used the calcein quenching assay and KO mice convincingly showing AQP4 is a functional water channel in cultured astroctyes. AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data of acute blocking. This difference may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices/in vivo results as noted previously (e.g., Risher et al., 2009 - PMID: 18720409). As suggested, methods for volume recordings will be examined.

      • It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged.

      • As also pointed by Reviewer #2, the description and interpretation of DW-MRI data will be improved.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable computational study that applies the machine learning method of bilinear modeling to the problem of relating gene expression to connectivity. Specifically, the author attempts to use transcriptomic data from mouse retinal neurons to predict their known connectivity. The results are promising, although the reviewers felt that demonstration of the general applicability of the approach required testing it against a second data set. Hence the present results were felt to provide borderline incomplete support for a key premise of the paper.

      We thank the reviewers for their insightful and constructive feedback. In response to the reviews, we have undertaken a comprehensive revision of our manuscript, incorporating changes and improvements as outlined below:

      (1) New results have been included showcasing the application of our bilinear model to a seconddataset focusing on C. elegans gap junction connectivity. This extension validates our model with a biological context other than mouse retina and facilitates a direct comparison with the spatial connectome model (SCM).

      (2) A new section titled "Previous Approaches" has been added to background, situating our studywithin the broader landscape of existing modeling methodologies.

      (3) The discussion sections have been expanded to fully incorporate the suggestions and insightsoffered by the reviewers. This includes a deeper exploration of the implications of our findings, potential applications of our model, and a more thorough consideration of its limitations and future directions.

      (4) To streamline the main text and ensure that the core narrative remains focused and accessible, select figures and tables have been relocated to the "Supplementary Materials" section.

      Reviewer 1 (Public Review):

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths:

      The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for recognizing the strengths of our work, particularly the clarity of the model presentation and its foundation in recommendation systems. In the revised manuscript we have also extended the model’s capabilities to analyze gene interactions for neural connectivity at single-cell resolution, when gene expression and connectivity of each cell are known simultaneously.

      Weaknesses:

      The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      We acknowledge the significance of validating our method across different datasets. In line with this, our revised manuscript now includes an expanded analysis utilizing a C. elegans gap junction connectivity dataset, which not only broadens the method’s demonstrated applicability but also underscores its versatility across varied neuronal systems.

      To address the concern of resolution and reproducibility associated with PCA preprocessing, we have conducted a comparative analysis from five replicates of the bilinear model, presenting the results in the revised manuscript (Figure S3). This analysis confirms the consistency of the solutions, as evidenced by the similarity metrics. Furthermore, we discussed alternative methodologies, such as L1 or L2 regularization, to tackle multicollinearity, offering flexibility in preprocessing choices.

      In response to feedback on the original Figure 5’s clarity, we have replaced the original Figure 5e-h with Table S4, which summarizes the gene ontology (GO) enrichment results and quantifies the number of genes associated with aspects of neural development and synaptic organization. This revision aims to improve the interpretability and accessibility of the results, ensuring a clearer presentation of the model’s insights.

      Finally, we have expanded our discussion to address the study’s limitations more comprehensively. This includes exploration of potentially missed connections and gene signatures, such as transcription factors, which might not be captured by a linear model due to its inherent preference for predictors with strong correlations to the target variable.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      We value your suggestion to compare our model with established methods. The revised manuscript now includes a comparative analysis with the spatial connectome model (SCM) using the same C. elegans dataset. In addition, a section reviewing previous approaches has been included in the background part, and the discussion part has been extended for the comparison.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      In the revised manuscript. we emphasized the bilinear model’s innovative application in the context of neuronal connectivity analysis, inspired by collaborative filtering in recommendation systems. We present quantitative performance metrics, such as the ROC-AUC score and Pearson correlation coefficient, as well as its comparison with the SCM, to benchmark our model’s efficacy in reconstructing connectivity matrices. We also quantified the overlap of the genetic interactions revealed by the bilinear model and the SCM (using the C. elegans dataset), and reported the percentage of the top genes associated with neural development and synaptic organization (using the mouse retina dataset). These numbers set a precedent for future methodological comparisons.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      We are grateful for your recognition of the study’s potential impact. The bilinear model indeed offers a foundation for future explorations, allowing for integration with more complex models, higher-resolution data, and diverse connectivity measures.

      Reviewer 1 (Recommendations For The Authors):

      The inclusion of predicted connectivity (Figure 6) of unknown BC neurons is useful as it shows that this is a strong hypothesis generation tool. This utility should potentially be showcased more as it is also brought up in the abstract, "genetic manipulation of circuit wiring", with an explanation of how the model could be leveraged as such. The discussion may benefit from a summarizing sentence regarding which key gene signatures were identified and are in line with the literature, which key gene signatures/connectivity motifs may have been missed, and which gene signatures are novel.

      Thank you for the insightful recommendation on emphasizing the model’s utility in generating hypotheses, particularly regarding predicting connectivity. In the revised manuscript, we have expanded the discussion on how our model can be leveraged to guide genetic manipulations at altering circuit wiring and highlighted its potential impact in the field.

      We have discussed key gene signatures identified from our model that are in line with existing literature, such as plexins and cadherins, which have been previously recognized for their involvement in synaptic connection formation and maintenance. We have also introduced potential new candidates, such as delta-protocadherins. In the revised manuscript, we summarized potentially missed gene signatures or synaptic connections, to provide a comprehensive view of our findings.

      Reviewer 2 (Public Review):

      Summary:

      In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths:

      This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      (1) The study lacks experimental validation of the model’s prediction results.

      We recognize the importance of experimental validation in substantiating the predictions made by computational models. While the primary focus of this study remains computational, we have dedicated a section in the revised manuscript, titled "Experimental Validation of Candidate Genes", to outline proposed methodologies for the empirical verification of our model’s predictions. This section specifically discusses the experimental exploration of novel candidate genes, such as deltaprotocadherins, within the mouse retina using AAV-mediated CRISPR/Cas9 genetic manipulation. We plan to collaborate with experimental laboratories to facilitate the validation. Given the extensive nature of experimental work, both in terms of time and resources, it is more pragmatic to present a comprehensive experimental investigation in a follow-up study.

      (2) The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      The question of the model’s broader applicability is well-taken. In response, we have expanded our analysis to include additional neuronal data and connectivity settings. Specifically, the revised manuscript includes results where we apply the model to a dataset of C. elegans gap junction connectivity, demonstrating its potential in different neuronal systems. This extension serves to illustrate the model’s adaptability and potential applicability to a broader range of neuronal connectivity studies.

      (3) The proposed method relies on the availability of neuronal connectomic data for model training,which may be limited or absent in certain brain connectivity settings.

      We acknowledge the limitations posed by the model’s dependency on comprehensive connectomic data, which may not be readily available across all research contexts. To address this, we have discussed in the revised manuscript several alternative strategies to adapt our model to the available data. This includes exploring the potential of applying the model to available data such as projectome, and integrating other data modalities such as electrophysiological measurements. These initiatives aim to enhance the model’s applicability and ensure its utility in a broader spectrum of brain connectivity studies, especially in scenarios where detailed connectomic data are not available.

      Reviewer 2 (Recommendations For The Authors):

      Q1. In this work, the author has mainly been studying the retina neuronal type connectivity, it will be interesting to see whether the model works for other brain regions or other neuronal type connectivity as well.

      We value your interest in the model’s applicability to other brain regions and neuronal types. To address this, we have extended our analysis in the revised manuscript to include a study on gap junction connectivity between C. elegans neurons. This extension demonstrates the model’s versatility and its potential applicability across various nervous systems and connectivity types.

      Q2. Whether the authors can use the same transformation matrices trained from the retina data to predict neuronal connectivity in other brain regions? Or an easier case, the connectivity between RGC types to the neuronal types in SC, dLGN, or other post-RGC-synaptic brain regions. As the neuronal connection mechanisms are conserved and widely shared between different neuronal types, one would expect the same transformation matrices may work in predicting other neuronal type connectivity as well (at least to some extent).

      The idea to use the same transformation matrices for predicting connectivity in other brain regions is intriguing. While direct application of these matrices to different regions remains challenging, we discussed the potential scalability of our model to other brain areas. By applying the model to combined datasets from various regions, we could uncover conserved neuronal connection mechanisms. This approach is theoretically feasible and is supported by the demonstrated scalability of the bilinear model and its deep learning variants in industrial applications.

      Q3. Section 5.2 Connectivity metric generation: in this work, the author uses the stratification profiles of the neurons to estimate the connectivity metric, how reliable this method is? There will be a scenario where though two neuronal types project to a similar inner plexiform layer, they may not have any connection. Have the authors considered combining other experimental data (like electrophysiology data or neuron tracing data)?

      We discussed the reliability of using stratification profiles for estimating connectivity metrics, acknowledging potential limitations. In the revised manuscript, we added discussion on how the integration of additional experimental data, such as electrophysiological and neuron tracing data, could enhance the accuracy of the connectivity metrics.

      Q4. Section 6 Model training and validation: does the author have a potential hypothesis as to why 2 dimensions are the best latent feature spaces dimensionality? One would imagine with more dimensionality, the model will give better results. Could it be that the connectivity data that is used to train the model is only considering the two-dimensional space of the neuronal stratification?

      The selection of two dimensions for the latent feature space was informed by 5-fold cross-validation, aimed at optimizing model generalization to unseen data. Here while increasing dimensionality improves performance on the training set, it does not necessarily enhance generalization to the validation set. Thus, the choice of two dimensions ensures good performance without overfitting to the training data.

      Q5. Could the author provide the source code for the analysis? Or could the author make it a python/R package so that non-computational biologists can easily apply the method to their own data?

      We have included a "Data and Code Availability" section in the revised manuscript. This section provides a link to the source code with pointers to datasets used in our study, facilitating the application of our methods by researchers from various backgrounds.

      Q6. I know it may be difficult for the author to do, but is it possible to design and perform some experiments to validate the model prediction results, either connectivity partners of transcriptomicallydefined RGC types or the function of the key genetic molecules (which hasn’t been discovered before)? The author may consider collaborating with some experimental labs. The author may even consider predicting the connectivity between RGC with some of its post-synaptic neurons in the brain regions, like SC or dLGN, as recently there are a lot of single-cell sequencing data as well as connectivity data.

      We appreciate your suggestion regarding experimental validation. As a future direction, we have discussed potential experimental approaches to validate the model’s predictions in the "Experimental Validation of Candidate Genes" section. Specifically, we propose an experimental design involving the manipulation of delta-protocadherins using AAV-mediated CRISPR/Cas9 and subsequent examination of connectivity phenotypes. We are also open to collaborating with experimental labs to further explore the model’s predictions, particularly in predicting connectivity between RGCs and their post-synaptic neurons in other brain regions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Comment: The fact that there are Arid1a transcripts that escape the Cre system in the Arid1a KO mouse model might difficult the interpretation of the data. The phenotype of the Arid1a knockout is probably masked by the fact that many of the sequencing techniques used here are done on a heterogeneous population of knockout and wild type spermatocytes. In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed. Knockout mice produce sperm, and probably litters, although a full description of the subfertility phenotype is lacking, along with identification of the stage at which cell death is happening by detection of apoptosis.

      Response: As the reviewer indicates, we did not observe a complete arrest at Pachynema. In fact, the histology shows the presence of spermatids and sperm in seminiferous tubules and epididymides (Fig. Sup. 3). However, our data argue that the wild-type haploid gametes produced were derived from spermatocyte precursors that have likely escaped Cre mediated activity (Fig. Sup. 4). Furthermore, diplotene and metaphase-I spermatocytes lacking ARID1A protein by IF were undetectable in the Arid1acKO testes (Fig. S4B). Therefore, although we do not demonstrate a strict pachytene arrest, it is reasonable to conclude that ARID1A is necessary to progress beyond pachynema. We have revised the manuscript to reflect this point (Abstract lines 17,18; Results lines 153,154)

      Comment: It is clear from this work that ARID1a is part of the protein network that contributes to silencing of the sex chromosomes. However, it is challenging to understand the timing of the role of ARID1a in the context of the well-known DDR pathways that have been described for MSCI.

      Response: With respect to the comment on the lack of clarity as to which stage of meiosis we observe cell death, our data do suggest that it is reasonable to conclude that mutant spermatocytes (ARID1A-) undergo cell death at pachynema given their inability to execute MSCI, which is a well-established phenotype.

      Comment: Staining of chromosome spreads with Arid1a antibody showed localization at the sex chromosomes by diplonema; however, analysis of gene expression in Arid1a KO was performed on pachytene spermatocytes. Therefore, is not very clear how the chromatin remodeling activity of Arid1a in diplonema is affecting gene expression of a previous stage. CUTnRUN showed that ARID1a is present at the sex chromatin in earlier stages, leading to hypothesize that immunofluorescence with ARID1a antibody might not reflect ARID1a real localization.

      Response: It is unclear what the reviewer means about not understanding how ARID1A activity at diplonema affects gene expression at earlier stages. Our interpretations were not based solely on the observation of ARID1A associations with the XY body at diplonema. In fact, mRNA expression and CUT&RUN analyses were performed on pachytene-enriched populations. ARID1A's association with the XY body is not exclusive to diplonema. Based on both CUT&RUN and IF data, ARID1A associates with XY chromatin as early as pachynema. Only at late diplonema did we observe ARID1A hyperaccumulation on the XY body by IF.

      Reviewer #2 (Public Review):

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: As explained in our response to these comments in the first revision, we respectfully disagree with this reviewer’s conclusions. We have been quantitative by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer #3 (Public Review):

      Comment: The data demonstrate that the mutant cells fail to progress past pachytene, although it is unclear whether this specifically reflects pachytene arrest, as accumulation in other stages of Prophase also is suggested by the data in Table 1. The western blot showing ARID1A expression in WT vs. cKO spermatocytes (Fig. S2) is supportive of the cKO model but raises some questions. The blot shows many bands that are at lower intensity in the cKO, at MWs from 100-250kDa. The text and accompanying figure legend have limited information. Are the various bands with reduced expression different isoforms of ARID1A, or something else? What is the loading control 'NCL'? How was quantification done given the variation in signal across a large range of MWs?

      Response: The loading control is Nucleolin. With respect to the other bands in the range of 100-250 kDa, it is difficult to say whether they represent ARID1A isoforms. The Uniprot entry for Mouse ARID1A only indicates a large mol. wt sequence of ~242 kDa; therefore, the band corresponding to that size was quantified. There is no evidence to suggest that lower molecular weight isoforms may be translated. Although speculative, it is possible that the lower molecular weight bands represent proteolytic/proteasomal degradation products or products of antibody non-specificity. These points are addressed in the revised manuscript (Legend to Fig S2, lines 926-931). Blots were scanned on a LI-COR Odyssey CLx imager and viewed and quantified using Image Studio Version 5.2.5 (Methods, lines 640-642).

      Comment: An additional weakness relates to how the authors describe the relationship between ARID1A and DNA damage response (DDR) signaling. The authors don't see defects in a few DDR markers in ARID1A CKO cells (including a low-resolution assessment of ATR), suggesting that ARID1A may not be required for meiotic DDR signaling. However, as previously noted the data do not rule out the possibility that ARID1A is downstream of DDR signaling and the authors even indicate that "it is reasonable to hypothesize that DDR signaling might recruit BAF-A to the sex chromosomes (lines 509-510)." It therefore is difficult to understand why the authors continue to state that "...the mechanisms underlying ARID1A-mediated repression of the sex-linked transcription are mutually exclusive to DDR pathways regulating sex body formation" (p. 8) and that "BAF-A-mediated transcriptional repression of the sex chromosomes occurs independently of DDR signaling" (p. 16). The data provided do not justify these conclusions, as a role for DDR signaling upstream of ARID1A would mean that these mechanisms are not mutually exclusive or independent of one another.

      Response: The reviewer’s argument is reasonable, and we have made the recommended changes (Results, lines 212-215; Discussion, lines 499-500).

      Comment: A final comment relates to the impacts of ARID1A loss on DMC1 focus formation and the interesting observation of reduced sex chromosome association by DMC1. The authors additionally assess the related recombinase RAD51 and suggest that it is unaffected by ARID1A loss. However, only a single image of RAD51 staining in the cKO is provided (Fig. S11) and there are no associated quantitative data provided. The data are suggestive but it would be appropriate to add a qualifier to the conclusion regarding RAD51 in the discussion which states that "...loss of ARID1a decreases DMC1 foci on the XY chromosomes without affecting RAD51" given that the provided RAD51 data are not rigorous. In the long-term it also would be interesting to quantitatively examine DMC1 and RAD51 focus formation on autosomes as well.

      Response: We agree with the reviewer’s comment and have made the recommended changes (Discussion, lines 518-519).

      Response to non-public recommendations

      Reviewer 2:

      Comment: Meiotic arrest is usually judged based on testicular phenotypes. If mutant testes do not have any haploid spermatids, we can conclude that meiotic arrest is a phenotype. In this case, mutant testes have haploid spermatids and are fertile. The authors cannot conclude meiotic arrest. The mutant cells appear to undergo cell death in the pachytene stage, but the authors cannot say "meiotic arrest."

      Response: We disagree with this comment. By IF, we see that ~70% of the spermatocytes have deleted ARID1A. Furthermore, we never observed diplotene spermatocytes that lacked ARID1A. The conclusion that the absence of ARID1A results in a pachynema arrest and that the escapers produce the haploid spermatids is firm.

      Comment: Fig. S2 and S3 have wrong figure legends.

      Response: The figure legends for Fig. S2 and S3 are correct.

      Comment: The authors do not appear to evaluate independent mice for scoring (the result is about 74% deletion above, Table S1). Sup S2: how many independent mice did the authors examine?

      Response:These were Sta-Put purified fractions obtained from 14-15 WT and mutant mice. It is difficult to isolate pachytene spermatocytes by Sta-Put at the required purity in sufficient yields using one mouse at a time. We used three technical replicates to quantify the band intensity, and the error bars represent the standard error of the mean (S.E.M) of the band intensity.

      Comment: Comparison of cKO and wild-type littermate yielded nearly identical results (Avg total conc WT = 32.65 M/m; Avg total conc cKO = 32.06 M/ml)". This sounds like a negative result (i.e., no difference between WT and cKO).

      Response: This is correct. There is no difference between Arid1aWT and Arid1aCKO sperm production. This is because wild-type haploid gametes produced were derived from spermatocyte precursors that have escaped Cre-mediated activity (Fig. S4). These data merely serve to highlight an inherent caveat of our conditional knockout model and are not intended to support the main conclusion that ARID1A is necessary for pachytene progression.

      Comment: The authors now admit ~ 70 % efficiency in deletion, and the authors did not show the purity of these samples. If the purity of pachytene spermatocytes is ~ 80%, the real proportion of mutant cells can be ~ 56%. It is very difficult to interpret the data.

      Response: The original submission did refer to inefficient Cre-induced recombination. The reviewer asked for the % efficiency, which was provided in the revised version. Also, please refer to Fig. S2, where Western blot analysis demonstrates a significant loss of ARID1A protein levels in CKO relative to WT pachytene spermatocyte populations that were used for CUT&RUN data generation.

      Comment: The authors should not use the other study to justify their own data. The H3.3 ChIP-seq data in the NAR paper detected clear peaks on autosomes. However, in this study, as shown in Fig. S7A, the authors detected only 4 peaks on autosomes based on MACS2 peak calling. This must be a failed experiment. Also, S7A appears to have labeling errors.

      Response: I believe the reviewer is referring to supplementary figure 8A. Here, it is not clear which labeling errors the reviewer is referring to. In the wild type, the identified peaks were overwhelmingly sex-linked intergenic sites. This is consistent with the fact that H3.3 is hyper-accumulated on the sex chromosomes at pachynema.

      The authors of the NAR paper did not perform a peak-calling analysis using MACS2 or any other peak-calling algorithm. They merely compared the coverage of H3.3 relative to input. Therefore, it is not clear on what basis the reviewer says that the NAR paper identified autosomal peaks. Their H3.3 signal appears widely distributed over a 6 kb window centered at the TSS of autosomal genes, which, compared to input, appears enriched. Our data clearly demonstrates a less noisy and narrower window of H3.3 enrichment at autosomal TSSs in WT pachytene spermatocytes, albeit at levels lower than that seen in CKO pachytene spermatocytes (Fig S8B and see data copied below for each individual replicate). Moreover, the lack of peaks does not mean that there was an absence of H3.3 at these autosomal TSSs (Supp. Fig. S8B). Therefore, we disagree with the reviewer’s comment that the H3.3 CUT&RUN was a failed experiment.

      Author response image 1.

      H3.3 Occupancy at genes mis-regulated in the absence of ARID1A

      Comment: If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      Response: As noted, we chose Stra8-Cre to conditionally knockout Arid1a because ARID1A is haploinsufficient during embryonic development. The lack of Cre expression in the maternal germline allows for transmission of the floxed allele, allowing for the experiments to progress.

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: In many experiments, we have been quantitative when possible by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer 3:

      Comment: The Methods section refers to antibodies as being in Supplementary Table 3, but the table is labeled as Supplementary Table 2.

      Response: This has been corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Here we address the major points raised by the reviewers.

      Reviewer #1 (Public Review):

      Weaknesses:

      • The signaling pathway upstream of Maf1 remains unknown. In eukaryotes, Maf1 is a negative regulator of RNA pol III and is regulated by external signals via the TORC pathway. Since TORC components are absent in the apicomplexan lineage, one central question that remains open is how Maf1 is regulated in P. falciparum. Magnesium is probably not the sole stimulus involved, as suggested by the observation that Ile deprivation also down-regulates RNA pol III activity.

      We agree that there is still much to uncover relating to the PfMaf1 signaling pathway. While we still do not know each component, we have been able to link external factors (of course not limited to only magnesium) to the increased nuclear occupancy of PfMaf1. Other protein interactors that potentially regulate PfMaf1, while not confirmed, have been identified in plasma sample as candidates for future experiments to validate their potential involvement of RNA Pol III inhibition.

      • The study does not address why MgCl2 levels vary depending on the clinical state. It is unclear whether plasma magnesium is increased during asymptomatic malaria or decreased during symptomatic infection, as the study does not include control groups with non-infected individuals. Along the same line, MgCl2 supplementation in parasite cultures was done at 3mM, which is higher than the highest concentrations observed in clinical samples.

      This reviewer raised a valid point. The plasma magnesium levels for the wet symptomatic samples (averaging [0.79mM]) were within the normal range of a healthy individual (between [0.75-0.95mM]) while the dry asymptomatic levels were above the normal range (averaging [1.13mM]). Ideally, we would have liked to have control uninfected plasma samples from individuals from The Gambia. Unfortunately, field studies and human volunteer studies do not always have all the ideal controls that in vitro studies have. We recognize that [3mM] is higher than the normal range for magnesium levels, which is why we included a revised Supplementary Figure 3A. This figure shows that magnesium concentrations as low as [1mM] (similar to the levels found in dry asymptomatic samples) reduced the expression of RNA Pol III-transcribed genes.

      • Although the study provides biochemical evidence of Maf1 accumulation in the parasite nuclear fraction upon magnesium addition, this is not fully supported by the immunofluorescence experiments.

      We agree that the resolution of IFA images does not allow to support the WB data. We believe that the importance of the IFA Supplementary Figure is to show that PfMaf1 clusters together in foci, which has not been previously reported.

      Reviewer #2 (Public Review):

      Weaknesses:

      However, most analyses are rather preliminary as only very few (3-5) candidate genes are analyzed by qPCR instead of carrying out comprehensive analyses with a large qPCR panel or RNA-seq experiments with GO term analyses. Data presentation lacks clarity, the number of biological replicates is rather low and the statistical analyses need to be largely revised. Although the in vivo data from wet (mildly symptomatic) and dry (asymptomatic) season parasites with different expression levels of Pol III-regulated genes, var genes, and MgCl2 are interesting, the link between the in vitro data and the in vivo virulence of P. falciparum, which is made in many sections of the manuscript, should be toned down. Especially since (i) the only endothelial receptor studied is CD36, which is associated with parasite binding during mild malaria, and (ii) several studies provide contradictory data on MgCl2 levels during malaria and in different disease states, which is not further discussed, but the authors mainly focused on this external stimulus in their experiments.

      We agree that, ideally, we would have liked to do full RNA-seq on The Gambia samples. However, that was out of the scope of this project. The RNA samples were limited which is why we did not use more primers. We believe that an appropriate number of replicates was done for the experiments. The wet symptomatic samples from this study were from mildly symptomatic individuals, as stated in the manuscript. Therefore, CD36 was a relevant receptor to use for our studies.

      We agree that the published studies about magnesium levels in infected individuals are not always consistent. What these studies do not consider is the time of year, whether the infection occurred during the dry or wet season. These studies were also done in different regions of the world using different technologies. For this reason, we only highlight the observed difference observed in our field study data from The Gambia.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) The signals upstream of Maf1 remain rather a black box. 4 are tested - heat shock and low-glucose, which seem to suppress ALL transcription; low-Isoleucine and high magnesium, which suppress Pol3. Therefore the authors use Mg supplementation throughout as a 'starvation type' stimulus. They do not discuss why they didn't use amino acid limitation, which could be more easily rationalised physiologically. It may be for experimental simplicity (no need for dropout media) but this should be discussed, and ideally, sample experiments with low-IsoLeu should be done too, to see if the responses (e.g. cytoadhesion) are all the same.

      We agree that deprivation of isoleucine would have been another experimental assay for our study, but it also would not have been as novel as magnesium. While understanding the exact mechanism or involvement of magnesium as a stress condition was not the scope of this manuscript, we believe that our data will be valuable into demonstrating that external stimuli act on P. falciparum virulence gene expression via RNA Pol III inhibition. Since we also had plasma level data for magnesium, and not isoleucine, we believed it made for a better external factor to use for our in vitro studies.

      (2) The proteomics, conducted to seek partners of Maf1, is probably the weakest part. From Figure S3: the proteins highlighted in the text are clearly highly selected (as ones that might be relevant, e.g. phosphatases), but many others are more enriched. It would be good to see the whole list, and which GO terms actually came top in enrichment.

      We apologize if the reviewer did not see the attached supplementary Co-IP MS data. The file includes all proteins found in each sample as well as GO term analysis. For the purpose of this work, we highlight proteins potentially involved in the canonical role of Maf1 that have been shown in model organisms to reversibly inhibit RNA Pol III (phosphatases, RNA Pol III subunits).

      (3) Figure 3 shows the Maf1-low line has very poor growth after only 5 days but it is stated that no dead parasites are seen even after 8 cycles and the merozoites number is down only ~18 to 15... is this too small to account for such poor growth (~5-fold reduced in a single cycle, day 3-5)? It would additionally be interesting to see a cell-cycle length assessment and invasion assay, to see if Maf1-low parasites have further defects in growth.

      We agree with the reviewer that the observed reduced merozoite numbers may not the only cause of the reduced growth rate. Other factors in the PfMaf1 knock-down line may contribute to the observed poor growth.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The findings in this manuscript are important in the gene editing in human-derived hematopoietic stem and progenitor cells. By optimizing the delivery tool, adding DNA-PK inhibitor and including spacer-breaking silent mutations, the editing efficiency is significantly increased, and the heterozygosity could be tuned. The editing is even across the hematopoietic hierarchy.

      Strengths:

      The precise gene editing is important in gene therapy in vitro and in vivo. The manuscript provides solid evidence showing the efficacy and uniqueness of their gene editing approach.

      Weaknesses:

      There are several extended and unique points shown in this paper but in a specific cell population.

      The findings are indeed in a specific cell lineage, though it should be noted the editing crossed multiple cell types within that lineage. More importantly though, HSPC have substantial relevance to understanding adult stem cell biology, blood formation, and leukemia. Critically, they are also the target cells for a plethora of gene therapies for anemias, immunodeficiencies, metabolic disorders, and are also being explored for use with CAR technologies. Indeed, CRISPR-based gene therapy was recently approved for clinical use. As such, the findings here are of substantial relevance for multiple areas of research including hematology, stem cell biology, cancer, immunology and more.

      Reviewer #2 (Public Review):

      Summary:

      This work by Cloarec-Ung et al. sets out to uncover strategies that would allow for the efficient and precision editing of primitive human hematopoietic stem and progenitor cells (HSPCs). Such effective editing of HSPCs via homology directed repair has implications for the development of tractable gene therapy approaches for monogenic hematopoietic disorders as well as precise engineering of these cells for clinical regenerative and/or cell therapy strategies. In the setting of experimental hematology, precision introduction of disease relevant mutations would also open the door to more robust disease modeling approaches. It has been recognized that to encourage HDR, NHEJ as the dominant mode of repair in quiescent HSPCs must be inhibited. Testing editing of human cord blood HSPCs the authors first incorporate a prestimulation phase then identify optimal RNP amounts and donor types/amounts using standard editing culture conditions identifying optimal concentrations of AAV and short single-stranded oligonucleode donors (ssODNs) that yield minimal impacts to cell viability while still enabling heightened integration efficiency. They then demonstrate the superiority of AZD7648, an inhibitor of NHEJ-promoting DNA-PK, in allowing for much increased HDR with toxicities imparted by this compound reduced substantially by siRNAs against p53 (mean targeting efficiencies at 57 and 80% for two different loci). Although AAV offered the highest HDR frequencies, differing from ssODN by a factor by ~2-fold, the authors show that spacer breaking sequence mutations introduced into the ssODN to better mimic the disruption of the spacer sequence provided by the synthetic intron in the AAV backbone yielded ssODN HDR frequencies equal to that attained by AAV. By examining editing efficiency across specific immunophenotypically identified subpopulations they further suggest that editing efficiency with their improved strategy is consistent across stem and early progenitors and use colony assays to quantify an approximate 4-fold drop in total colony numbers but no skewing in the potentiality of progenitors in the edited HSPC pool. Finally, the authors provide a strategy using mutation-introducing AAV mixed with different ratios of silent ssODN repair templates to enable tuning of zygosity in edited CD34+ cells.

      Strengths:

      The methods are clearly described and the experiments for the most part also appropriately powered. In addition to using state of the art approaches the authors also provided useful insights into optimizing the practicalities of the experimental procedures that will aid bench scientists in effectively carrying out these editing approaches, for example avoiding longer handling times inherent when scaling up to editing over multiple conditions.

      The sum of the adjustments to the editing procedure have yielded important advances towards minimizing editing toxicity while maximizing editing efficiency in HSPCs. In particular, the significant increase in HDR facilitated by the authors' described application of AZD7648 and the preservation of a pool of targeted progenitors is encouraging that functionally valuable cell types can be effectively edited.

      The discovery of the effectiveness of spacer breaking changes in ssODNs allowing for substantially increased targeting efficiency is a promising advance towards democratizing these editing strategies given the ease of designing and synthesizing ssODNs relative to the production of viral donors.

      The ability to zygosity tune was convincingly presented and provides a valuable strategy to modify this HDR procedure towards more accurate disease modelling.

      Weaknesses:

      Despite providing convincing evidence that functional progenitors can be successfully edited by their procedure, as the authors acknowledge it remains to be verified to what degree the self-renewal capacity and in vivo regenerative potential of the more primitive fractions is maintained with their strategy.

      As other the 53BP1-based editing strategy that also disrupt DNA-PK have demonstrated maintained allele frequencies over engraftment time (De Ravin et al. Blood 2021), this suggests that a transient disruption of DNA-PK shouldn’t compromise regenerative potential. Of course, we strongly agree that maintained regenerative potential is important in any editing strategy. As such, for the version of record we have added clonal LT-CIC assessment using conditions that we’ve previously demonstrated predict long-term repopulating potential (Knapp et al. Nat Cell Bio 2018). This data, which has been added to Figure 3, shows no significant reduction in the frequency of the most potent LT-CIC in edited cells compared to unedited controls.

      Assessments of the potential for off-target effects via the authors' approach was somewhat cursory and would have benefited from a more thorough evaluation.

      Once again in the 53BP1 strategy, the authors of that study already performed CHANGE-seq, long-range PCR, NGS, and SKY with inhibition of this same pathway without obvious increases in off-target editing (as long as HDR donor was present, though they did interestingly observe increased large deletions when HDR donors were absent, De Ravin et al. Blood 2021). Our tests here were designed to confirm that our molecule was similarly not affecting off-target editing rather than to launch a large-scale investigation. We agree, however, that off-targets and particularly structural re-arrangements that could be missed by other approaches remain a concern. We have added in nanopore sequencing of the predicted off-target sites and thus verified more deeply that there was no change (indeed no observable off-target activity) at any of these sites. This data has been added to Figure 2 and to a new supplementary Figure S5. Additionally, while it’s beyond the scope of the current manuscript, a focused follow-up dedicated to structural rearrangements downstream of both single and multiple edits is currently in progress and will be submitted separately later this year.

      Viability was assessed by live cell counting however given the short-term nature of the editing assay, more sensitive readouts of potentially compromised cell health could have provided a more stringent assessment of how the editing methodology impacted cell fitness.

      Of course, we agree that viable cell counting does not fully predict whether the cell is viable in terms of retained proliferative potential or other functional potentials. This point was addressed for myeloid progenitors at least by the CFC assays already in the manuscript, as to form a colony these cells were definitionally viable at input. Indeed, in these tests, we did see a reduction beyond that of the viable counts as already discussed in the text. Similarly, we already inadvertently answered this in the general CD34+CD45RA- population in Figure 4C where we measured clonal growth following editing with different mutant to silent donor ratios. In this instance we observed 30-40% clonogenic frequencies (Figure 4C), though in this case without a specific non-edited control (as this was not the intended question). None-the-less, this would indicate that any general viability loss was no more than observed in the CFC tests (even if we assume 100% cloning efficiency if the cells had been unedited). Finally, the clonal LTC-IC show that while there is perhaps some loss in more committed progenitors, those with the highest self-renewal potential are not compromised in the edited condition compared to control (Figure 3I).

      Recommendations for the authors

      Reviewer #2 (Recommendations For The Authors):

      It will be important to include the author-provided new paragraph in the discussion to contextualize this work in the existing HSPC editing landscape and your unique findings.

      A new paragraph detailing how our manuscript fits with other recently published works is now included in the discussion.

      The legend for Figure 3 needs correction. Panel E is incorrectly labeled as panel D and panel F is incorrectly labeled as panel E.

      Thank you for catching this typo. It has been fixed.

      In Figure 4 axis headings in panel C and D require clarity beyond simply titles of "Mean Frequency".

      These axis labels have been clarified.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this letter, we respond to each of the reviewers’ comments. We support responses by referring to the revised manuscript and, where necessary, by including additional descriptions and analyses that we consider extrinsic to the manuscript itself. In this letter, all changes to the manuscript are shown in blue. As noted, the displayed figures have been added to the manuscript or the SI. We believe that we have successfully addressed all comments and that the quality of our paper has improved significantly.

      Comment 1: In addition to the technical comments by the reviewers, I would encourage the authors to discuss the dependency of their observations, e.g. emergence of microphase separation, not only on the sequence of the polypeptides, but also on the solution conditions. Similarly, the distributions of ions in the condensate bulk, interphase, and diluted phase, and hence the interfacial free energy, are significantly affected both by the chemical composition of the condensate and the salt concentration itself, see: https://pubs.acs.org/doi/10.1021/acs.nanolett.1c03138

      We thank the editor for this suggestion. Here, we have focused on the effect of sequence on condensate organization. We agree that how changes in solution condition affect condensate, including microphase separation of ELPs, is potentially interesting as well. We note this as a possible future direction at multiple places in the revised Conclusions and Discussion:

      “The simulations successfully reproduced condensate stability variation upon amino acid substitution. While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature [cite] or salt [cite] dependent models to explore how solution conditions affect the organization of ELP condensates.”

      “Such a microenvironment arises from the collective behavior of many proteins, can deviate from that of individual chains, and is likely sensitive to the solution conditions,[cite] which are held constant in our study. Future work on systems with double amino acid substitutions or changes to salt concentration or temperature could elucidate the generality of the mean field interpretation and the additivity of individual contributions.”

      Response to referee 1

      Comment 0: This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates. The results should be published. However, there is significant room for improvement in the presentation and interpretation of the results. As it stands, the precise definition of “frustration,” which is a main theme of this manuscript (as emphasized in the title), is not sufficiently well articulated. This situation should be rectified to avoid ””rustration” becoming a ”catch-all” term without a clear perimeter of applicability rather than a precise, informative description of the physical state of affairs. There are also a few other concerns, e.g., regarding interpretation of correlation of phase-separation critical temperature and transfer free energy of amino acid residues as well as the difference between critical temperature and onset temperature, and the way the simulated configurations are similar to that of gyroids.

      We want to thank the reviewers for their insightful comments. We revised the manuscript extensively to improve its clarity and to address the reviewers’ concerns. In the following, we provide point-to-point responses to all the comments.

      Comment 1: It is accurately pointed out on p.4 that elastin-like polypeptides (ELPs) undergo heat-induced phase separation and therefore exhibit lower critical solution temperatures (LCSTs). But it is not entirely clear how this feature is reproduced by the authors’ simulation. A relationship between simulated surface tension and “transition temperature” is provided in Fig.1C; but is the ”transition temperature” (authors cited ref.41 by Urry) the same as critical temperature? Apparently, Urry’s Tt is””critical onset temperature”, the temperature when phase separation happens at a given polymer concentration. This is different from the (global) critical temperature LCST - though the two may be correlated-or not-depending on the shape of the phase boundary. Moreover, is the MOFF coarse-grained forcefield (first step in the multi-scale simulation), by itself, capable of reproducing heat-induced phase separation in a way similar to the forcefield of Dignon et al., ACS Cent Sci 5, 821-230 (2019)? Or is this temperature-dependent effect appearing only subsequently, after the implementation of the MARTINI and/or all-atom steps? Clarification is needed. To afford a more informative context for the authors’ introductory discussion, the aforementioned Dignon et al. work and the review by Cinar et al. [Chem Eur J 25, 13049-13069 (2019)], both touching upon the physical underpinning of the LCST feature of elastin, should also be cited along with refs.41-43.

      We thank the reviewer for their comment. First, we apologize for the lack of clarity between the global lower critical solution temperature, Tc, and the transition temperature, Tt. We have modified the manuscript to be more explicit that the transition temperature we utilize is dependent on the solution conditions, instead of the global lower critical solution temperature.

      Author response image 1.

      Tt as a function of concentration for ELP[V5A2G3] constructs of different chain lengths. Logarithmic fits to the data for each construct using Eq. 1 are also shown. It is evident that the different curves converge to the critical temperature Tc at the critical concentration Cc. Figure reproduced from ref.[2] CC BY 4.0.

      However, as shown by Chilkoti and coworkers [1, 2] and in Author response image 1, the critical temperature of ELPs Tc is indeed linearly related to Tt with the following relationship

      The above equation highlights the dependence of Tt on the chain length (length) and polymer concentration (conc). The parameter Cc is the corresponding theoretical polypeptide concentration that would be required to achieve Tc, and k is the proportionality constant. Instead of making computationally expensive predictions of condensate critical temperatures, we focused on the surface tension, which can be more readily determined from single constant temperature simulations as detailed in the Methods section. This decision was made so to make it computationally feasible to systematically probe the properties of all 20 amino acids in diblock ELPs in our multiscale model. Furthermore, an expected relationship between the critical temperature and the surface tension can be inferred based on the Flory Huggins theory. In particular, relationships between the Flory Huggins parameter, χ, and interfacial tension (τ) have been investigated, and the relationship can be approximated as

      where α is a positive constant, whose exact value depends on the proximity of χ to the critical value of χ necessary for phase separation (χC).[3, 4] As detailed in new Supplemental Theory of the Supporting Information, for systems undergoing LCST,

      with Therefore, we have

      Several conclusions can be drawn from Eq. 4. First, for α = 1, τ is linearly proportional to Tc. Secondly, τ decreases at larger values for Tc since trend that is consistent with results presented in Figure 1 of the main text. Finally, as detailed in the Supplemental Theory, the inverse relationship between τ and Tc is only expected for systems exhibiting LCSTs. For systems with UCST, τ increases at larger Tc. Therefore, reproducing the correct trend supports the model’s ability to capture the temperature-dependent effect specific to the ELP system.

      We modified the text to define the physical meaning of Tt more explicitly. Furthermore, we added a new section in the Supporting Information titled Supplemental Theory to detail the relationship between Tt, Tc, the Flory-Huggins parameter χ, and the surface tension τ. The updated text now reads:

      “Utilizing the simulated condensate conformations, we computed various quantities to benchmark against experimental measurements. While the critical temperature has been widely used as a measure for condensate stability, determining it computationally is expensive. As an alternative, we computed the surface tension, τ, using 100-µs-long MARTINI simulations performed with the NPNAT ensemble.[cite] As detailed in the Supplemental Theory in the Supporting information, an inverse relationship is expected between τ and the critical temperature, Tc, for systems exhibiting LCSTs. We further approximate Tc with the transition temperatures (Tt) of ELP sequences,[cite] which are the temperatures at which ELPs undergo an LCST transition at a specified solution condition. Tt was shown to be linearly proportional to TC[cite]. As expected, a negative correlation can be readily seen between computed surface tension and experimental Tt (Fig. 1C). This observed negative correlation between Tt and τ supports the simulation approach’s accuracy in reproducing the sequence-dependent changes in ELP phase behavior.”

      The reviewer is correct that MOFF does not explicitly account for temperature-dependent effects in its interaction parameters. But as mentioned above and indicated by the reviewer, the following steps with explicit solvent simulations in the multiscale strategy succeed in capturing sequence-dependent differences in ELP systems, which are evident in both transition temperature and surface tension.

      We cited the two references suggested by the reviewer in the introduction. We further added the following text in the discussion section to suggest explicitly exploring temperature-dependent effects as an interesting future direction.

      “While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature[cite] or salt[cite] dependent models to explore how solution conditions effect the organization of ELP condensates.”

      Comment 2: “Frustration” and ”frustrated” are used prominently in the manuscript to characterize certain observed molecular configurations (11 times total, in both the title and in the abstract). Apparently, it is the most significant conceptual pronouncement of this work, hence its precise meaning is of central importance to the authors’ thesis. Whereas one should recognize that the theoretical and experimental observations are striking without invocation of the “frustration” terminology, usage of the term can be useful if it offers a unifying conceptual framework. However, as it stands, a clear definition of the term “frustration” is lacking, leaving readers to wonder what molecular configurations are considered “frustrated” and what are not (i.e., is the claim of observation of frustration falsifiable?). For instance, “frustrated microphase separation” appears in both the title and abstract. A logical question one may ask is: “Are all microphase separations frustrated”? If the answer is in the affirmative, does invocation of the term “frustration” add anything to our physical insight? If the answer is not in the affirmative, then how does one distinguish between microphase separations that are frustrated from those that are not frustrated? Presumably all simulated and experimental molecular configurations in the present study are those of lowest free energy for the given temperature. In other words, they are what they are. In the discussion about frustrated phase separation on p.13, for example, the authors appear to refer to the fact that chain connectivity is preventing hydrophobic residues to come together in a way to achieve the most favorable interactions as if there were no chain connectivity (one may imagine in that case all the hydrophobic residues will form a large cluster without microphase separation). Is this what the authors mean by “frustration”? If that’s true, isn’t that merely stating the obvious, at least for the observed microphase separation? In general, does “frustration” always mean deviation of actual, physical molecular configurations from certain imagined/hypothetical/reference molecular configurations, and therefore dependent upon the choice of the imagined reference configuration? If this is how the authors apply the term “frustration” in the present work, what is the zero-frustration reference state/configuration for microphase separation? And, similarly, what is the zero-frustration reference state/configuration when frustrated EPS-water interactions are discussed (p.14-p.15, Fig.5)? How do non-frustrated water-protein interactions look like? Is the classic clathrate-like organization of water hydrogen bonds around small nonpolar solute “frustrated”?

      We thank the reviewer for their insightful comment, and agree that the concept of “frustration” is both important to our conclusions and, upon review, is too vague in our previous draft of the manuscript.

      For conceptual simplicity and to maximize transferability to real biological systems, we will focus our discussion of frustration on one specific type, which we term “chain frustration.” Chain frustration occurs in states where tertiary interactions between chemically distinct polymer blocks favor phase separation, while chain connectivity prevents macroscopic phase separation from occurring.[5] This frustration leads to microphase separation with microdomains of different monomers.

      We agree with the reviewer that “all microphase separations” are frustrated, and have revised the title to

      “Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Furthermore, we also removed frustration from the abstract to read

      “The interspersion of hydrophilic and hydrophobic residues and a lack of secondary structure formation result in an interfacial environment, which explains both the strong correlation between ELP condensate stability and interfacial hydrophobicity scales, as well as the prevalence of protein-water hydrogen bonds.”

      We have limited our discussion of the frustration to the incomplete separation of hydrophobic and hydrophobic groups. As pointed out by the reviewer, in this case, frustration refers to the fact that chain connectivity is preventing hydrophobic residues from coming together in a way to achieve the most favorable interactions as if there were no chain connectivity. The reference would be a perfectly macroscopic phase separation that partitions hydrophobic from hydrophilic groups.

      While the frustration from chain connectivity is well understood for block copolymers[5], its effect on producing the interfacial solvation environment, to the best of our knowledge, has not been emphasized before. We have revised the text at the point where we mention frustration to clearly define its meaning.

      “Therefore, while microphase separation occurs in ELP condensates, frustration remains in the system. Hydrophilic residues cannot completely separate from hydrophobic ones due to constraints imposed by the acid sequence, creating unique microenvironments.”

      When discussing the interactions between ELP and water, we used the hydrogen bond analysis to emphasize the interfacial environment. For example, the hydrophobic residues tend to “repel” water molecules, reducing the hydrogen bond density; on the other hand, hydrophilic residues and backbone retain water molecules. This difference resulted in the positive and negative correlation with Tt shown in Fig 5C. The behavior of water molecules is, therefore, inhomogeneous inside the condensate. We expect water molecules to become frustrated due to the simultaneous contact with both hydrophobic and hydrophilic chemical groups, and a perfect reference state would be the pure water environment. However, since this point is not central to our study, to avoid confusion, we have avoided mentioning frustration and revised the text to read amino acid sequence, creating unique microenvironments.”

      “The water hydrogen bond density also highlights an interfacial environment of blended hydrophobic and hydrophilic regions.”

      After revising the text, frustration only appears three times in the manuscript.

      Comment 3: In the discussion about the correlation of various transfer free energy scales for amino acids and Urry’s critical onset temperature (ref.41) on p.11 and Fig.4, is there any theoretical relationship to be expected between the interactions among amino acids of ELPs and their critical onset temperatures? While a certain correlation may be intuitively expected if the free energy scale ”is working”, is there any theoretical insight into the mathematical form of this relationship? A clarifying discussion is needed because it bears logically on whether the observed correlation or lack thereof for different transfer energy scales is a good indication of the adequacy of the energy scales in describing the actual physical interactions at play. This question requires some prior knowledge of the expected mathematical relationship between interaction parameters and onset temperature.

      We thank the reviewer for their comment. The exact relationship between the interactions between amino acids and their transition temperature can be understood in terms of the Flory-Huggins theory, which describes the thermodynamics of polymer mixtures using a lattice model. The chemical composition of the mixture is built into the polymer-solvent interaction parameter

      Where is the coordination number, T is the temperature, kB is the Boltzmann constant, and {ϵpp, ϵss, ϵps} are the strength of polymer-polymer, solventsolvent, and polymer-solvent interactions respectively.[6]

      From the original derivation of Flory-Huggins theory, it can be shown that phase separation occurs when χ is greater than its critical value, or χC, we can derive the critical temperature as

      Δϵ can indeed be interpreted as the free energy cost of transferring a polymer bead from a solution phase to a polymer phase. It corresponds to the change of energy from a mixed state, with contacts between polymer and solvent (ϵps), to the demixed state with only polymer-polymer (ϵpp) and solvent-solvent (ϵss) contacts.

      Therefore, the transfer free energy, and the interactions among amino acids of ELPs, are expected to correlate with the critical temperature. The above discussion has been incorporated into the new section Supplemental Theory in the Supporting Information. There, we also discuss the more general scenario where Δϵ is temperature dependent, which is essential for giving rise to LCST.

      We have modified the main text in the discussions of Figure 4 to better explain these mathematical relationships and their necessary assumptions in order to help interpret our simulations. Here is an expert from where we discuss Figure 4:

      “The strong dependence of molecular organization on amino acid hydrophobicity suggests that the solvation environment of individual residues might be a determining factor for condensate stability. Indeed, as shown in the Supplemental Theory of the Supporting Information, the critical temperature is closely related to the free energy cost of transferring polymer beads from a solution state to a polymer-only environment. This transfer free energy is often used to quantify the hydrophobicity of amino acids [cite]. To explore their relationship more quantitatively, we compared the transition temperature for ELP condensates measured by Urry [cite] to several hydrophobicity scales.”

      Comment 4: To provide a more comprehensive context for the present study, it is useful to compare the microphase separation seen in the authors’ simulation with the micelle-like structures observed in recent simulated condensed/aggregated states of hydrophobic-polar (HP) model sequences in Statt et al., J Chem Phys 152, 075101 (2020) [see esp. Fig.6] and Wesse´n et al., J Phys Chem B 126, 9222-9245 (2022) [see, e.g., Fig.10].

      We thank the reviewer for this suggestion. The results of Statt et al. and Wessen et al.´ indeed provide a nice comparison to our results. While we capture some of the same behavior they observe, the full array of chemical space in our model seems to give some additional morphologies as well.

      First, as predicted by the self-consistent field theory, block copolymers are expected to form primarily lamellar like micelles that clearly seperate the dense and dilute phase when the volume fraction, f, is 0.5 (Response to Comment 5). This prediction is indeed consistent with results from simulations with the HP model, and is consistent with our simulations when the substituted amino acid, X, is sufficiently polar.

      However, this observation is only one of several behaviors we observe. In particular, our simulations also produce gyroid-like structures, which are predicted to emerge at small volume differences, i.e. f ≈ 0.4 or f ≈ 0.6. These different configurations likely emerge due to the more realistic representation of amino acids in our model, which presents more frustration than the HP model. In particular, the backbone atoms are inherently hydrophilic and cannot separate from the hydrophobic side chains. Therefore, under microphase separation, it is inherently difficult to separate the different chemical groups to form lamellar or micelle-like structures. This produces a condensate interior with interfacial properties that may not be captured by the HP model.

      We make note of the micelle-like topologies predicted by HP models in the revised text, citing both Statt et al. and Wessen et al.:´

      “Surprisingly, microphase separation did not produce lamellar morphology as expected for block copolymers with equal volume fraction of the two blocks (Fig. S3 in the Supporting Information) [cite]. In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobicpolar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      Comment 5: ”Gyroid-like morphology” is mentioned several times in the manuscript (p.4, p.8, p.17, Fig.S3). This is apparently an interesting observation, but a clear explanation is lacking. A more detailed and specific discussion, perhaps with additional graphical presentations, should be provided to demonstrate why the simulated condensed-phase ELP configurations are similar to the classical description of gyroid as in, e.g., Terrones & Mackay, Chem Phys Lett 207, 45-50 (1993) and Lambert et al., Phil Trans R Soc A 354, 2009-2023 (1996).

      We thank the reviewer for their comment. Gyroids are canonical structures for diblock copolymers.[5, 7, 8, 9] Their stability is predicted using self-consistent field theory (SCFT), and occurs due to the balance of the volume fraction of polymer block A (fA), the length of the polymer (N), and the Flory-Huggins interaction parameter (χ).[8, 9] The prediction from SCFT suggests that gyroids occur at smaller values of χN and values fA near, but not equal to 0.5 (Author response image 2).[10] We hypothesize that these configurations emerge at equal molar fraction of V and X amino acids due to small differences in solvation volume between each half of the polymer chain.

      Our support for gyroid-like structures is mainly from observations of two interpenetrating networks formed by the two ELP blocks. We have revised Figure S4 to clearly highlight the two networks as shown in Author response image 3.

      We have revised the main text to clearly define the gyroid-like structures as interpenetrating networks, and added the theoretical phase diagram of diblock copolymers predicted by SCFT as Figure S3 in the Supporting Information.

      “In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobic-polar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      We note, however, that proving that our observations are indeed gyroid structures requires more sophisticated mathematical analysis that is beyond the scope of the study. It is also possible that these structures are metastable in our simulations. We emphasize these caveats in the updated Discussion Section.

      “Further studies on the thermodynamic stability of these morphologies and comparing them with predictions from the self-consistent field theory shall provide more insights into the driving forces for their emergence [cite].”

      Author response image 2.

      Theoretical phase diagram[8] and corresponding morphologies for diblock copolymers. The phases are labeled as: body centered cubic (BCC), hexagonal cylinders (HEX), gyroid (GYR), and lamellar (LAM). fA is the volume fraction of a single polymer block, denoted A, χ is the Flory-Huggins interaction parameter, and N is the total degree of polymerisation. Figure reproduced from ref.[10] CC BY 4.0.

      Author response image 3.

      Representative configurations of (A) V5F5 and (B) V5L5 condensates from MARTINI simulations. The valine substituted half of the chain is colored blue (V5) and the X substituted half of the chain is colored red (X5). To highlight the interpenetrating networks formed by the two halves, only the X substituted half of the chain is shown on the left. Simulation interfaces are once repeated periodically in the positive x and positive y dimensions for clarity. High density regions formed by the multiple X substituted half of the chains are highlighted in yellow circles, with one of the chain shown in green.

      Response to referee 2

      Comment 1: The experimental characterization relies on BODIPY and SBD reporting, respectively, on viscosity and polarity. The fluorescent signal of these dyes can possibly depend on many other factors, including quenching. Additional controls are required, or a more extensive discussion with additional references, and a mention to potential limitations of this approach.

      We agree with the reviewer that the fluorescence lifetime signal will be affected by many factors. Compared with the fluorescence intensity, the fluorescence lifetime mainly depends on the dyes’ self properties and environmental factors. BODIPY and SBD have been used in biological systems to detect the microviscosity and micropolarity of condensates. Our group published the same SBD and BODIPY fluorophores in previous work to quantify the microenvironment of protein aggregation and condensations. The extended data (ChemBioChem 20:1078–1087. doi: 10.1002/cbic.201800782; Aggregate 4:e301. doi:10.1002/agt2.301; Nat Chem Biol 1–9. doi:10.1038/s41589-023-01477-1) shows evidences that the BODIPY is only sensitive to the viscosity while SBD is only sensitive to the polarity, but nonsensitive to other environmental factors. As for the quenched issue, the fluorophores with extended pi-rich structure display aggregation-caused quenching (ACQ) effect in high probe concentration, which will lower the fluorescence lifetime and intensity. We usually labeled the 20% molar ratio of the ELPs using NHS-ester fluorophores to get stock solutions. Due to the labeling efficiency, the exact labeling ratio is much lower than 20%. The labeled ELP stock solution will be further mixed with unlabeled ELP to get ELP solutions with low labeling fractions. We measured the ELPs labeled with a different fraction of dyes. The result shows that only BODIPY performs slight ACQ phenomena at a high

      Author response image 4.

      FLIM images of ELP condensates labeled with different fractions of dyes. A) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% BODIPY labels. B) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% fraction of SBD. Droplets were formed with a final concentration of 70 µM ELP labeled with different fractions of BODIPY or SBD in 2 M NaCl solution. Scale bar:5 µm.

      To mostly avoid the potential ACQ effect and achieve enough fluorescence signals, we finally use the ELP labeled with a lower fraction of dyes, 1% of BODIPY and 2.5 % of SBD, to perform the FLIM experiments. The data in Figure 3 will be corrected with the following data.

      Author response image 5.

      Structures of NHS-BODIPY and NHS-SBD, and representative FLIM images of V30A30, A30V30, V30G30 and G30V30 labeled with respective fluorophores. The fluorescence lifetime of each image is the average acquired from three independent experiments. Scale bar: 5 µm.

      We revised the text in the section Microphase separation of ELP condensates as follows “To experimentally test the microphase separation behavior uncovered in simulations, we studied the micro-physicochemical properties of the V-end and X-end of the peptides. We constructed diblock peptides with the combination of 30 pentameric repeats of V block and X (A or G) block, namely V30A30 and V30G30 (Experimental Sequences Section in the Supporting Information). The amino-termini of V30A30 and V30G30 sequences were subsequently labeled with environmentally sensitive BODIPY or SBD fluorophores [cite], whose lifetime could be measured to quantify the viscosity or polarity of the V-end (Fig. 3A, left panel) [cite]. These probes have been reported to be only sensitive to single physicochemical properties.[cite] To avoid artifacts induced by fluorophore labeling, we usually used ELPs labeled with a low fraction of dyes. We also constructed A30V30 and G30V30 diblock peptides, wherein the viscosity or polarity of the A-end or the G-end could be measured by fluorophores that are attached at the amino-terminus (Fig. 3A, right panel). Using FLIM, we found that the lifetime of BODIPY for the V-end (5.43 ns) was longer than that for the A-end (4.35 ns), suggesting that the V-end indeed has a higher microviscosity than the A-end (ηV= 2233.54 cp vs ηA= 969.57 cp). Accordingly, the lifetime of SBD was longer for the V-end (8.75 ns) than the A-end (7.00 ns), indicating that the micropolarity of the V-end was lower than the A-end (ϵV= 13.25 vs ϵA = 18.97). These observations could be largely attributed to the greater extent of dehydration at the V-end due to its higher local peptide density. We further showed that the observed differences are not results of possible artifacts arising from any subtle distinctions between the two sequences V30A30 and A30V30 (Experimental Characterization of ELP Condensates Section in the Supporting Information, Fig. S8-S9 in the Supporting Information). Similar results were observed using the V-G sequences. FLIM experiments revealed that the V-end was more viscous than the G-end (ηV= 2972.72 cp vs ηG= 1958.60 cp) and the V-end was less polar than the G-end (ϵV= 9.14 vs ϵG = 27.50). These experimental observations provided the first line of evidence to support the microphase separation, as suggested by the simulation results.”

      We revised the text in the section Experimental methods as follows

      “The proteins of interest were labeled with NHS ester fluorophore. We used ELPs with 1% BODIPY labels or 2.5% SBD labels to form condensates, which avoid the artifacts induced by fluorophores. Droplets were formed with the final concentration of 70 µM ELP in 2 M NaCl for V-A and 1.5 M NH4SO4 for V-G diblock, respectively. A drop of droplets containing solution was placed on a 0.17 mm coverslip with a 500 µm spacer. Images were acquired by Leica Falcon Fluorescence Microscope equipped with Wil pulse laser and 63X/0.12 oil-immersion objective. The BODIPY was excited at 488 nm and the SBD was excited at 448 nm. The fluorescence lifetime fitting and image analysis were performed in LAS X and Image J.”

      We also used a lower concentration of free dyes to remeasure the properties of the ELP condensates. The Figure S9 data are corrected as follows. The slight differences between the results are caused by experimental errors, which don’t affect the conclusion.

      Author response image 6.

      FLIM image of unlabeled ELP condensates. A) Chemical structure of free fluorophore, which can measure the physicochemical properties of condensates without labeling. B) Representative FLIM images of V30A30 and A30V30. The mix is the mixture of V30A30 (35 µM) and A30V30 (35 µM). Droplets were formed with a final concentration of 70 µM ELP in 2 M NaCl solution with 1 µM fluorophore. C) Representative FLIM images of V30G30 and G30V30. Droplets were formed with a final concentration of 70 µM ELP in 1.5 M (NH4)2SO4 solution with 1 µM fluorophore. The mix is the mixture of V30G30(35 µM) and G30V30 (35 µM). Scale bar, 5 µm. The fluorescence lifetime of each image is the average from three independent measurements.

      We also revised the Sequence dependence of micro-viscosity and polarity section of the Supporting Information as follows

      “Since we used V30X30 and X30V30 to quantify the V- and X-end of the V-X blocks, it is possible that the observed differences arose from the innate property of the V30X30 and X30V30 sequences. To rule out this artifact, we formed the ELP condensates with sequences of V30X30, X30V30, or the V30X30 and X30V30 mixture. The condensates were subsequently treated with the aldehydeBODIPY and methyl-ester SBD fluorophores without the NHS ester reactive warhead (Fig. S9A in the Supporting Information). After brief incubation, aldehyde-BODIPY and methyl-ester SBD fluorophores were recruited into and homogeneously distributed in the ELP condensates. The fluorescence lifetime of aldehyde-BODIPY was the same for V30A30 (4.96 ns), A30V30 (4.99 ns), and their mixture (4.98 ns) (Fig. S9B in the Supporting Information, upper panel). Interestingly, this value is around the average (4.89 ns) of the A-end (4.35 ns) and the V-end (5.43 ns) labeled NHS-BODIPY. For the SBD measurement, methyl-ester SBD resulted in almost identical lifetime values of V30A30 (8.25 ns), A30V30 (8.27 ns), and their mixture (8.28 ns) (Fig. S9B in the Supporting Information, lower panel), again around the average values (7.88 ns) of the A-end (7.00 ns) and the V-end (8.75 ns) labeled NHS-SBD. In addition to the V-A blocks, similar observations were made for the V-G blocks as V30G30 and G30V30 sequences (Fig. S9C in the Supporting Information). The slight difference between the results is attributed to the experiment errors. Because the fluorophores did not covalently label the amino-terminus of the ELP peptides, their lifetime reports closer to the averaged property of the condensates instead of the microscopic property of the V-end or the X-end when the number of molecules is sufficient and the molecular distribution has no preference.

      Our results reveal that the V30X30 and X30V30 condensates exhibited similar macroscopic viscosity or polarity, suggesting that the previously observed different viscosity or polarity of V30X30 and X30V30 could be attributed to the microscopic property of the V-end or X-end.”

      The FLIM technique combined with environment-sensitive fluorophores is a powerful tool for us to investigate the physicochemical properties of the microenvironment within the condensates. However, there are some limitations to this method. As the fluorophore is labeled in the protein, we can only detect the microenvironment surrounding the surface of the probe(the distance may be angstrom level). The fluorescence signal values we got are the statistical average of the fluorescence signals from the complex microenvironments. The signal from the probes is determined by the sampling position, orientation, and number of fluorescent probes. So the quantified values can be compared relatively, but these values can not accurately describe the physical or chemical states in different systems. In addition, the resolution in FLIM experiments is not enough to directly distinguish the microstructure in condensates.

      Comment 2: It is unclear if, after the application of stretching, the micro-structure will eventually return to the original configuration or not. Overall, the point of this experiment remains somewhat unclear.

      We thank the reviewer for this comment. The ELP condensates are actually viscous fluids and they could coalesce into larger droplets within seconds. Due to the high viscosity, ELP condensates show slow fluorescence recovery after photobleaching. As stretching the condensates, the micro-structure of condensates changes to show a response to the outer force. The fluorophores may be pulled out from the microenvironment. For such a dynamic system, we speculate that the microstructure will return to the original after the condensation system equilibrium, which may be a long process. However, it is hard to characterize whether these microstructures have completely returned to their original positions. The purpose of this experiment is to show the microenvironment properties of each terminal in another aspect. The experiment also shows evidence that the microenvironment around the V terminus is more dense than the A terminus.

      Comment 3: The title is too generic and does not reflect the content of the work. There is no analysis of biological condensates. The results are specific to di-block polypetides with specific sequences. This should be clearly specified in text and title.

      We have revised the title to ”Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Comment 4: MD is out of the expertise of this reviewer. However, when looking at the density profiles (Figure S2), the simulation does not seem to be fully converged. The densities fluctuate inconsistently along the Z direction. The authors should comment on assessing simulation convergence. In many cases, the section used for the density values in the plot (i.e., below 0.06 box lengths away from the condensate center) does not seem representative of the dense phase. It should be justified, why these simulations can still be used for density/hydrogen bonding analysis.

      We thank the reviewer for their comment, and agree that convergence of MD simulations is simultaneously important and difficult to control for. To demonstrate the convergence of our simulations, we have taken an example system (V5F5) and reproduced the density profile in 4 unique time windows of 50 ns each (Author response image 7A-D). We find that all distributions are nearly identical, indicating that further extending these simulations is unlikely to change our findings.

      While we agree that the choice of 0.06 box lengths is arbitrary, it was chosen as an approximation for the interior of the condensate, where the more hydrophobic half of the protein chain tends to be at higher concentration. However, this choice is not important to our overall conclusion. Halving (Author response image 7E) or doubling (Author response image 7F) the cutoff maintains the inverse correlation between the protein density of the X5 half of the condensate and experimental transition temperature.

      Finally, in our multiscale simulation approach, the all-atom portion of the simulation is mostly used to examine water structure and protein solvation. We can see that dividing the simulation into four independent time estimates does not substantially change these properties, resulting in low standard deviations in Figure 5 and Figure 6. Similarly, our previous work on the dielectric of ELP condensates has shown that choosing different starting structures from MARTINI simulations is unlikely to effect the estimate of similar quantities.[11]

      Author response image 7.

      Checking convergence of all-atom simulations of ELP condensates. (A-D) The relative mass density along the Z-distance from the condensate center is shown for the V-substituted and X-substituted halves of V5F5 in four independent time windows of 50 ns each. The Z−axis is defined as the direction perpendicular to the condensate-water interface. The dashed line represents a Z-distance of 0.06 box lengths away from the condensate center, which was the original cutoff for correlation analysis. E-F) Correlation between the mass fraction of the X5 half of the condensate and transition temperature (Tt) from Urry.[12] The condensate is defined as having a Z-distance of 0.03 box lengths (E) or 0.12 box lengths (F) away from the condensate center. ρ is the Pearson correlation coefficient between the two data sets, and the dashed diagonal line is the best fit line. Error bars represent standard deviations of the mean taken over box length intervals of 0.01.

      References

      (1) McDaniel JR, Radford DC, Chilkoti A (2013) A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14:2866–2872.

      ](2) Meyer DE, Chilkoti A (2004) Quantification of the effects of chain length and concentration on the thermal behavior of elastin-like polypeptides. Biomacromolecules 5:846–851.

      (3) Helfand E, Tagami Y (1972) Theory of the interface between immiscible polymers. J. Chem. Phys. 56:3592.

      (4) Roe RJ (1975) Theory of the interface between polymers or polymer solutions. I. Two components system. J. Chem. Phys. 62:490–499.

      (5) Shi AC (2021) Frustration in block copolymer assemblies. J. Phys. Condens. Matter 33.

      (6) Flory PJ (1942) Thermodynamics of high polymer solutions. J. Chem. Phys. 10:51.

      (7) Grason GM (2006) The packing of soft materials: Molecular asymmetry, geometric frustration and optimal lattices in block copolymer melts. Phys. Rep. 433:1–64.

      (8) Matsen MW, Bates FS (1996) Unifying weak- and strong-segregation block copolymer theories. Macromolecules 29:1091–1098.

      (9) Matsen MW, Schick M (1994) Stable and unstable phases of a diblock copolymer melt. Phys. Rev. Lett. 72:2660–2663.

      (10) Swann JM, Topham PD (2010) Design and application of nanoscale actuators using block-copolymers. Polymers 2:454–469.

      (11) Ye S et al. (2023) Micropolarity governs the structural organization of biomolecular condensates. Nat. Chem. Biol. pp 1–9.

      (12) Urry DW (1997) Physical chemistry of biological free energy transduction as demonstrated by elastic protein-based polymers. J. Phys. Chem. B 101:11007–11028.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.

      Points that could be addressed or discussed:

      (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.

      (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.

      We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .

      We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).

      (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.

      Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.

      Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.

      (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.

      Author response image 1.

      Location of virus expression (A) and optic fiber placement (B) within subregions of POA.

      (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

      Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.

      Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.

      Author response image 2.

      Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      Author response image 3.

      Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.

      REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).

      The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

      The effect size for REM sleep deprivation is now added in the text.

      It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.

      We apologize for omitting this important paper. In the revised manuscript, we added this citation.

      (2) Materials and Methods.

      Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.

      GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).

      The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).

      We have now added these important citations in the revised manuscript.

      (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.

      We have now added these citations in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).

      This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.

      Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.

      We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .

      Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.

      Author response image 4.

      ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.

      There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.

      To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .

      Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?

      We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.

      Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).

      Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.

      REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.

      Thank you for this suggestion. We have added the description in the main text.

      Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).

      We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.

      References

      Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.

      Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.

      Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.

      Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.

      Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.

      Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.

      Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.

      Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.

      Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.

      Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.

      Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.

      Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.

      Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.

      Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.

      Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.

      Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.

      McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.

      Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.

      Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.

      Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.

      Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.

      Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.

      Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.

      Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.

      Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.

      Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.

      Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.

      Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.

      Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.

      Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.

      Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Dormancy/diapause/hibernation (depending on how the terms are defined) is a key life history strategy that allows the temporal escape from unfavorable conditions. Although environmental conditions do play a major role in inducing and terminating dormancy (authors call this energy limitation hypothesis), the authors test a mutually non-exclusive hypothesis (life-history hypothesis) that sex-specific selection pressures, at least to some extent, would further shape the timing of these life-history events. Authors use a metanalytic approach to collect data (mainly on rodents) on various life-history traits to test trade-offs among these traits between sexes and how they affect entry and termination of dormancy.

      Strengths:

      I found the theoretical background in the Introduction quite interesting, to the point and the arguments were well-placed. How sex-specific selection pressures would drive entry and termination of diapause in insects (e.g. protandry), especially in temperate butterflies, is very well investigated. Authors attempt to extend these ideas to endotherms and trying to find general patterns across ectotherms and endotherms is particularly exciting. This work and similar evidence could make a great contribution to the life-history theory, specifically understanding factors that drive the regulation of life cycle timing.

      Weaknesses:

      (1) I felt that including 'ectotherms' in the title is a bit misleading as there is hardly (in fact any?) any data presented on ectotherms. Also, most of the focus of the discussion is heavily mammal (rodent) focussed. I believe saying endotherms in the title as well is a bit misleading as the data is mammalfocused.

      We change the title to : "Evolutionnary trade-offs in dormancy phenology". This is a hybrid article comprising both a meta-analysis and a literature review. Each of these parts brings new elements to the hypotheses presented. The statistical analyses only concern mammals and especially rodent species. But the literature review highlighted links between the evolution of dormancy in ectotherms and endotherms that have not been linked in previous studies. We feel it is important for readers to know that much of the discussion will focus on the comparison of these two groups. But we understand that placing the term ectotherms in the title might suggest a meta-analysis including these two groups.

      In addition, we indicated more specifically in the abstract and at the end of the introduction that the article includes two approaches associated with different groups of animals.

      We also specified in the section « review criteria » that:

      Only one bird species is considered to be a hibernator, and no information is available on sex differences in hibernation phenology (Woods and Brigham 2004, Woods et al. 2019).

      We have also added a "study limitations" section, which explains that although the meta-analysis is limited by the data available in the literature, the information available for the species groups not studied seems to support our results.

      (2) I think more information needs to be provided early on to make readers aware of the diversity of animals included in the study and their geographic distribution. Are they mostly temperate or tropical? What is the span of the latitude as day length can have a major influence on dormancy timings? I think it is important to point out that data is more rodent-centric. Along the line of this point, is there a reason why the extensively studied species like the Red Deer or Soay Sheep and other well-studied temperate mammals did not make it into the list?

      We specified in the abstract and at the end of the introduction that the species studied in the metaanalysis are mainly Holarctic species. We have also added a map showing all the study sites used in the meta-analysis. Finally, we've noted in the methods and added a "study limitation" section at the end of the discussion an explanation for those species that were not studied in the meta-analysis and the consequences for the interpretation of results

      The hypotheses developed in this article are based on the survival benefits of seasonal dormancy thanks to a period of complete inactivity lasting several months. The Red Deer or Soay Sheep remain active above ground throughout the year.

      The effect of photoperiod on phenology is one of the mechanisms that has evolved to match an activity with the favorable condition. In this study, we are not interested in the mechanisms but in the evolutionary pressures that explain the observed phenology. Interspecific variation in the effect of photoperiod results from different evolutionary pressures, which we are trying to highlight. It is therefore not necessary to review mechanisms and effects of photoperiod, themselves requiring a lengthy review.

      We also tested the “physiological constraint hypothesis” on several variables. Temperature and precipitation are factors correlated with sex differences in phenology of hibernation. These factors allow consideration of the geographical differences that influence hibernation phenology.

      (3) Isn't the term 'energy limitation hypothesis' which is used throughout the manuscript a bit endotherm-centric? Especially if the goal is to draw generalities across ectotherms and endotherms. Moreover, climate (e.g. interaction of photoperiod and temperature in temperatures) most often induces or terminates diapause/dormancy in ectotherms so I am not sure if saying 'energy limitation hypothesis' is general enough.

      We renamed this hypothesis the "physiological constraint hypothesis" and we have made appropriate changes in the text so as not to focus physiological constraints solely on energy aspects.

      (4) Since for some species, the data is averaged across studies to get species-level trait estimates, is there a scope to examine within population differences (e.g. across latitudes)? This may further strengthen the evidence and rule out the possibility of the environment, especially the length of the breeding season, affecting the timing of emergence and immergence.

      For a given species, data on hibernation phenology are averaged for different populations, but also for the same population when measurements are taken over several years. To test these hypotheses on a population scale, precise data on reproductive effort would be needed for each population tested, but this concerns very few species (less than 5).

      Testing the effects of temperature and precipitation allows us to take into account the effects of climate on phenology.

      (5) Although the authors are looking at the broader patterns, I felt like the overall ecology of the species (habitat, tropical or temperate, number of broods, etc.) is overlooked and could act as confounding factors.

      Yes, that's why we also tested the physiological constraints hypothesis, including the effect of temperature and precipitation. For the life-history hypothesis, we also tested reproductive effort, which takes into account the number of offspring per year.

      (6) I strongly think the data analysis part needs more clarity. As of now, it is difficult for me to visualize all the fitted models (despite Table 1), and the large number of life-history traits adds to this complexity. I would recommend explicitly writing down all the models in the text. Also, the Table doesn't make it clear whether interaction was allowed between the predictors or not. More information on how PGLS were fitted needs to be provided in the main text which is in the supplementary right now. I kept wondering if the authors have fit multiple models, for example, with different correlation structures or by choosing different values of lambda parameter. And, in addition to PGLS, authors are also fitting linear regressions. Can you explain clearly in the text why was this done?

      To simplify the results, we reduced the number of models to just three: one for emergence and two for immergence. In place of Table 1, we have written the structure of the models used. We have added a sentence to the statistics section: “each PGLS model produces a λ parameter representing the effect of phylogeny ranging between 0 (no phylogeny effect) and 1 (covariance entirely explained by co-ancestry)”. We have tested only three PGLS models and the estimated lambda value for these models is 0.

      (7) Figure 2 is unclear, and I do not understand how these three regression lines were computed. Please provide more details.

      We tested new models and modified existing figures.

      Reviewer #2 (Public Review):

      Summary:

      An article with lots of interesting ideas and questions regarding the evolution of timing of dormancy, emphasizing mammalian hibernation but also including ectotherms. The authors compare selective forces of constraints due to energy availability versus predator avoidance and requirements and consequences of reproduction in a review of between and within species (sex) differences in the seasonal timing of entry and exit from dormancy.

      Strengths:

      The multispecies approach including endotherms and ectotherms is ambitious. This review is rich with ideas if not in convincing conclusions.

      Weaknesses:

      The differences between physiological requirements for gameatogenesis between sexes that affect the timing of heterothermy and the need for euthermy during mammalian hibernator are significant issues that underlie but are under-discussed, in this contrast of selective pressures that determine seasonal timing of dormancy. Some additional discussion of the effects of rapid climate change on between and within species phenologies of dormancy would have been interesting.

      Reviewer #2 (Recommendations For The Authors):

      This review provides a very interesting and ambitious among and within-species comparison of the seasonal timing of entry and exit from dormancy, emphasizing literature from hibernating mammals (sans bats and bears) and with attention to ectotherms. The authors test hypotheses related to the timing of food availability (energy) versus life history considerations (requirements for reproduction, avoiding predation) while acknowledging that these are not mutually exclusive. I offer advice for clarifications and description of the limitations of the data (accuracy of emergence and immergence times), but mainly seek more emphasis for small mammalian hibernators on the contrast for requirements for significant periods of euthermy prior to the emergence in males versus females, a contrast that has energetic and timing consequences in both the active and hibernation seasons.

      A consideration alluded to but not fully explained or discussed is the differences in mammals between species and sexes in the timing of what can be called ecological hibernation, which is the seasonal duration that an animal remains sequestered in its burrow or den, and heterothermic hibernation, between the beginning and end of the use of torpor. The two are not synonymous. When "emergence" is the first appearance above ground, there is a significant missing observation key to the energetic contrasts discussed in this review, that of this costly pre-emergence behavior.

      To explain the difference between heterothermic hibernation and ecological hibernation, we've added a section in review Criteria from materials and methods :

      “In this study, we addressed what can be called ecological hibernation, i.e. the seasonal duration that an animal remains sequestered in its burrow or den, which is assumed to be directly linked to the reduced risk of predation. In contrast, we did not consider heterothermic hibernation, which corresponds to the time between the beginning and end of the use of torpor. So when we mention hibernation, emergence or immergence, the specific reference is to ecological hibernation.”

      In arctic and other ground squirrel species, males remain at high body temperatures after immerging and remaining in their burrows in the fall for several days to a week, and more consistently and importantly, males that will attempt to breed in the spring end torpor but remain constantly in their burrows for as much as one month at great expense whilst undergoing testicular growth, spermatogenesis, spemiation, and sperm capacitation, processes that require continuous euthermy. Female arctic ground squirrels and non-breeding males do not and typically enter their first torpor bout 1-2 days after immergence and first appear above ground 1-3 days after their last arousal in spring.

      The weeks spent euthermic in a cold burrow in spring by males while undergoing reproductive maturation require a significant energetic investment (can equate to the cost of the previous heterothermic period) that contrasts profoundly with the pre-mating energetic investment by females.

      Males cache food in their hibernacula and extend their active season in late summer/fall in order to do so and feed from these caches in spring after resuming euthermy, often emerging at body weights similar to that at immergence. Similar between-sex differences in the timing of hibernation and heterothermy occur in golden-mantled and Columbian ground squirrels and likely most other Urocitellus spp., though less well described in other species. These differences are related to life histories and requirements for male vs. female gameatogenesis and, at the same time, energetic considerations in the costs to males for remaining euthermic while undergoing spermatogenesis and the cost related to whether males undergo gonadal development being dependent on individual body mass and cache size. These issues should be better discussed in this review.

      It is the time required to complete spermatogenesis, spermiation, and maturation of sperm not the time for growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We took this comment into account. As we found little evidence of an increase in testicular maturation time with relative testicular size (apart from table 4 in Kenagy and Trombulak, 1986), we no longer tested the effect of relative testicular size on protandry.

      We examined whether the ability to store food before hibernation might reduce protandry. Although food storage in the burrow may be favored for overcoming harsh environments or predation, model selection did not retain the food-storing factor. Thus, the ability to accumulate food in the burrow was not by itself likely to keep males of some species from emerging earlier (e.g. Cricetus cricetus, protandry : 20 day, Siutz et al., 2016). Early emerging males may benefit from consuming higher quality food or in competition with other males (e.g., dominance assertion or territory establishment, Manno and Dobson 2008).

      We developed these aspects in the discussion

      While it is admirable to include ectotherms in such a broad review and modelling, I can't tell what data from how many ectothermic species contributed to the models and summary data included in the figures.

      Too few data on ectotherms were available to include ectotherms in the meta-analysis

      Some consideration should be made to the limitations of the data extracted from the literature of the accuracy of emergence and immergence dates when derived from only observations or trapping data. The most accurate results come from the use of telemetry for location and data logging reporting below vs. above ground positioning and body temperature.

      We added a "study limits" section to the discussion to address all the limits in this commentary.

      L64 "favor reproduction", better to say "allow reproduction", since there is strong evolutionary pressure to initiate reproduction early, often anticipating favorable conditions for reproduction, to maximize the time available for young to grow and prepare for overwintering themselves.

      Also, generally, it is not how "harsh" an environment is but rather how short the growing season is.

      We took this comment into account.

      L80 More simply, individuals that have amassed sufficient energy reserves as fat and caches to survive through winter may opt to initiate dormancy. This may decrease but not obviate predation, since hibernating animals are dug from their burrows and eaten by predators such as bears and ermine.

      In this sentence, we indicated a gap between dormancy phenology and the growing season, which suggests survival benefits of dormancy other than from a physiological point of view. We've changed the sentence to make it clearer : “However, some animals immerge in dormancy while environnemental conditions would allow them (from a physiological point of view) to continue their activity, suggesting other survival benefits than coping with a short growing season”

      L88 other physiological or ecological factors.... (gameatogenesis).

      In this study, we examine possible evolutionary pressures and therefore the environmental factors that may influence hibernation phenology. We focus on reproductive effort because, assuming predation pressure, we would expect a trade-off between survival and reproduction.

      L113 beginning early to afford long active seasons to offspring while not compromising the survival of parents.

      We added to the sentence:

      “For females, emergence phenology may promote breeding and/or care of offspring during the most favorable annual period (e.g., a match of the peak in lactational energy demand and maximum food availability, Fig. 1) or beginning early to afford long active seasons to offspring while not compromising the survival of parents.”

      L117 based on adequate preparation for overwintering and enter dormancy....

      We modified the sentence as follows :

      recovering from reproduction, and after acquiring adequate energy stores for overwintering”

      L123 given that males outwardly invest the least time in reproduction yet generally have shorter hibernation seasons would seem to reject this hypothesis. This changes if you overtly include the time and energy that males expend while remaining euthermic preparing for hibernation, a cost that can be similar to energy expended during heterothermy.

      Males invest a lot of time in reproduction before females emerge (whether for competition or physiological maturation) and some males seem to be subject to long-term negative effects linked to reproductive stress (see Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313). Both processes may contribute to reducing the duration of male hibernation.

      L125 again, costs to support euthermy in males undergoing reproductive development is an investment in reproduction.

      You're right, but it's difficult to quantify. We tested a model that takes into account the reproductive effort during reproduction and prior to reproduction. We also considered the hypothesis that species living in a cold climate might have a low protandry while having a high reproductive effort due to their ability to feed in the burrow (interaction effect between reproductive effort and temperature). We think these changes answer your comment.

      L134 It isn't growing large testes that takes time, but instead completing spermatogenesis and maturation of sperm in the epdidymides.

      We removed this part.

      L140 Later immergence in male ground squirrels is related to accumulation and defense of cached food, activities that are related to reproduction the next spring. An experimental analysis that would be revealing is to compare immergence times in females that completed lactation to the independence of their litters vs. females that did not breed or lost their litters. Who immerges first?

      Body mass variation from emergence to the end of mating in males seems to explain the delayed immergence of males in species that don't hide food in their burrows for hibernation. For example, in spermophilus citellus, males immege on average more than 3 weeks after females, yet they do not hide food in their burrows for the winter.

      Such a study already exists and shows that non-breeding females immerge earlier than breeding females. We refer to it

      L386: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      L164 So you examined literature from 152 species but included data from only 29 species? Did you include data from social hibernators (marmots) that mate before emergence?

      With current models, we have 28 different species. We have few species because very few have data on both sex difference data and information on reproductive effort data (especially for males).

      Data on sex differences in hibernation were not available for social hibernating species.

      L169 Were these data from trapping or observation results? How reliable are these versus the use of information from implanted data loggers or collars that definitively document when euthermy is resumed and/or when immergence and first emergence occurs (through light loggers)?

      We did not focus heterothermic hibernation, but in ecological hibernation. We have no idea of the margin of error for these types of data, but we have discussed these limitations in the "Study limitations" section.

      L180, again, it is the time required to complete spermatogenesis and spermiation not the time for the growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We removed this part.

      L200 Males that accumulate caches in fall and then feed from those during the spring pre-emergence euthermic interval and after will often be at their seasonal maximum in body mass. Declining from that peak may not be stressful.

      It has been suggested that reproductive effort in Spermophilus citellus might induce long-term negative effects that delay male immergence.

      Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313.

      L210 How about altitude, which affects the length of the growing season at similar latitudes?

      We extracted the location of each study site to determine the temperature and precipitation at that precise location (based on interpolated climate surface). We therefore take into account differences in growing season (based on temperature) in altitude between sites.

      L267 How did whether males cache food or not figure into these comparisons? Refeeding before mating occurs during the pre-emergence euthermic interval.

      We removed this part.

      L332, 344 not a "proxy" but functionally related to advantages in mating systems with multiple mating males.

      We removed this part.

      L353 The need for a pre-emergence euthermic interval in male ground squirrels requires costs in the previous active season in accumulating and defending a cache and the proximal costs in spring while remaining at high body temperatures prior to emergence with resulting loss in body mass or devouring of the cache.

      You're right, but in this section, we quickly explain the benefits of food catching compared with other species that don't do so.

      L385 This review should discuss why females are not known to cache and contrast as "income breeders" from "capital breeder" males. What advantages of caches are females indifferent to (no need for a prolonged pre-emergence period) and what costs of accumulating caches do they avoid (prolonged activity period and defense of caches).

      We clarified the case of female emergence.

      L321 : “Thus, an early emergence of males may have evolved in response to sexual selection to accumulate energy reserve in anticipation of reproductive effort. Females, on the contrary, are not subject to intraspecific competition for reproduction and may have sufficient time before (generally one week after emergence) and during the breeding period to improve their body condition.”

      L388 I don't understand the logic of the conclusion that "did not ...adequately explain the late male immergence" in this section. The greater mass loss in males over the mating period is afforded by the presence of a cache that requires later immergence.

      We removed this part.

      L412 Not just congeners that invest less in reproduction, but within species individuals that do not attempt to breed in one or more years and thus have no reproductive costs should be an interesting comparison for differences in phenology from individuals that do breed. Non-breeders are often yearlings but can be a significant overall proportion of males that fail to fatten or cache enough to afford a pre-emergence euthermic period.

      L385: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      The sentence refers to individuals who reproduce little or not at all.

      L445 Males that gain weight between emergence and mating may do so by feeding from a cache regardless of how "harsh" an environment is.

      We observe this phenomenon even in species that are not known to hoard food

      “Gains in body mass observed for some individuals, even in species not known to hoard food, may indicate that the environment allows a positive energy balance for other individuals with comparable energy demands.”

      L492 Some insects retreat to refugia in mid-summer to avoid parasitism (Gynaephora).

      Escape from parasites is also a benefit of dormancy.

      Fig 1 - It is difficult to see the differences in black and green colors, esp if color blind.<br /> Maternal effort is front-loaded within the active season (line for "optimal period" shown in midseason).

      Add "energy" underneath c) Prediction (H1) and "reproduction" underneath d) "Prediction (H2). Explain the orange vs black, green colors of triangles.

      We made the necessary changes

      Fig 2 - I don't buy the regression lines as significant in this figure. The red line, cannot have a regression with two sample points and without the left-hand most dot, nothing is significant.

      We deleted this graph.

      Fig 3 - females only?

      We deleted this graph.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Below I summarize points that should be addressed in a revised version of the manuscript.

      • Page 6, first paragraph: I don't understand by the signals average out to a single state. If the distribution is indeed randomly distributed, a broad signal with low intensity should be present.

      We agree that this statement may cause confusion. We changed the text (marked in bold) to clarify the statement: The mobility of the undocked SBDs will be higher than the diffusion of the whole complex, allowing the sampling of varying interdomain distances within a single burst. However, these dynamic variations are subsequently averaged to a singular FRET value during FRET calculations for each burst, and may appear as a single low FRET state in the histograms.

      • Page 6, third paragraph: how can the donor only be detected in the acceptor channel? Is this tailing out?

      Donor only signal is not detected in the acceptor channel. As described in page 5 and in the Materials & Methods section, the dye stoichiometry value is defined for each burst/dwell using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      When no acceptor fluorophore is present FAA=0 and S=1.

      Some donor photons bleed through into the acceptor channel, but we correct for this by calculating the leakage and crosstalk factors as described in the Materials and Methods (page 20).

      We changed the text (marked in bold) in the manuscript to address the question: The FRET data of both OpuA variants is best explained by a four-state model (Figure 2A,B; fourth and fifth panel) (Supplementary File 3). Two of the four states represent donor-only (S≈1) or acceptor-only (S≈0) dwells. The full bursts belonging to donor-only and acceptor-only molecules were excluded prior to mpH2MM. This means that some molecules transit to a donor-only or acceptor-only state within the burst period, which most likely reflects blinking or bleaching of one of the fluorophores. These donoronly and acceptor-only states were also excluded during further analysis. The other two states reflect genuine FRET dwells that were analyzed by mpH2MM. They represent different conformations of the SBDs.

      • Page 7, "SBD dynamics ..": why was the V149Q mutant only analyzed in the K521C background and not also in the N414C background?

      The two FRET states were best distinguished in OpuA-K521C. Therefore, we decided to focus on OpuA-K521C and not OpuA-N414C. OpuA-V149Q was used to show that reduced docking efficiency does not affect the transition rate constants and relative abundances of the two FRET states, and we regarded it sufficient to test the SBD dynamics in OpuA-K521C only.

      • Page 8, second paragraph: why was the N414C mutant analyzed only from 0 - 600 mM and not also up to 1000 mM?

      In line with the previous answer, our main focus was on OpuA-K521C, since the two FRET states were best distinguished in OpuA-K521C. OpuA-N414C was used to prove that similar states are observed when measuring with fluorophores on the opposite site of the SBD. We studied how the FRET states change in response to different conditions that correspond to different stages of the transport cycle and how it changes in response to different ionic strengths. Initially, 600 mM KCl was used to study the dynamics of the SBD at high ionic strength. Later in this study, we tested a very wide range of different salt concentrations for OpuA-K521C to get detailed insights into the dynamics of the SBDs over a wide ionic strength range. Note that 1 M KCl is a very high, non-physiological ionic strength for the typical habitat of L. lactis and was only used to show that the high FRET state occurs even under very extreme conditions.

      • Page 8, third paragraph: why was the dimer (if it is the source of the FRET signal) only partially disrupted?

      We acknowledge that this is a very good point. However, we purposely did not speculate on this point in the manuscript, because we have limited information on the molecular details of the interaction. As we highlight on page 8, the SBDs experience each other in a very high apparent concentration (millimolar range). This means that the interactions are most likely very weak (low affinity) and not very specific. Such interactions are in the literature referred to as the quinary structure of proteins and they occur at the high macromolecular crowding in the cell and in proteins with tethered domains, and thus at high local concentrations. Such interactions can be screened by high ionic strength. In the revised manuscript, we now present the partially disrupted dimer structure in the context of the quinary structure of a protein (page 11):

      In other words, the high FRET state may comprise an ensemble of weakly interacting states rather than a singular stable conformation, resembling the quinary structure of proteins. The quinary structure of proteins is typically revealed in highly crowded cellular environments and describes the weak interactions between protein surfaces that contribute to their stability, function, and spatial organization (Guin & Gruebele, 2019). Despite the current study being conducted under dilute conditions, the local concentration of SBDs (~4 mM) mimics a densely populated environment and reveal quinary structure.

      • Page 9, second paragraph: according to the EM data processing, only 20% of the particles were used for 3D reconstruction. Why? Does it mean that the remaining 80% were physiologically not relevant? If so, why were the 20% used relevant?

      We note that it is a fundamental part of image processing of single particle cryo-EM data to remove false positives or low-resolution particles throughout the processing workflow. In particular when using a very low and therefore generous threshold during automated particle picking, as we did (t=0.01 and t=0.05 for the 50 mM KCl and 100 mM KCl datasets, respectively), the initial set of particles includes a significant amount of false positives – a tradeoff to avoid excluding particles belonging to low populated classes/orientations. It is thus common that more than 50% of ‘particles’ are excluded in the first rounds of 2D classification. In our case, only 30% and 52% of particles were retained after such first clean-up steps. Subsequently, the particle set is further refined, and additional false positives and low-resolution particles are excluded during extensive rounds of 3D classification. We also note that during the final steps, most of the data excluded represents particles of lower quality that do not contribute to a high-resolution, or belong to low population protein conformations. This does not mean that such a population is not physiological relevant. In conclusion, having only 5-20% of the initial automated picked particles contributing to the reconstruction of the final cryo-EM map is common, with the vast majority of excluded particles being false positives.

      • Page 11, third paragraph: the way the proposed model is selected is also my main criticism. All alternative models do not fit the data. Therefore, the proposed model is suggested. However, I do not grasp any direct support for this model. Either I missed it or it is not presented.

      Concerning the specific model in Figure 5, the reviewer is correct. We do not provide direct evidence for a side-ways interaction. However, we have evidence of transient interactions and our data rule out several scenarios of interaction, leaving 5C as the most likely model. This is also the main conclusion of this paper: In conclusion, the SBDs of OpuA transiently interact in a docking competent conformation, explaining the cooperativity between the SBDs during transport. The conformation of this interaction is not fixed but differs substantially between different conditions.

      Because the interaction is very short-lived it was not possible to visualize molecular details of this interaction. We present Figure 5 to hypothesize the most likely type of interaction, since many possibilities can be excluded with the vast amount of presented data. To make our point more clear that we discuss models and rule out several possibilities but not demonstrate a specific interaction between the SBDs, we now write on page 10 (changes marked in bold): We have shown that the SBDs of OpuA come close together in a short-lived state, which is responsive to the addition of glycine betaine (Figure 4A). Although the occurrence of the state varies between different conditions, it was not possible to negate the high-FRET state completely, not even under very high or low KCl concentrations, or in the presence of 50 mM arginine plus 50 mM glutamate (Figure 4A,B). To evaluate possible interdomain interactions scenarios we consider the following: (1) The SBDs of OpuA are connected to the TMDs with very short linkers of approximately 4 nm, which limit their movement and allow the receptor to sample a relatively small volume near its docking site. (2) in low ionic strength condition OpuA-K521C displays a high FRET state with mean FRET values of 0.7-0.8, which correspond to inter-dye distances of approximately 4 nm. (3) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (4) The distance between the density centers of the SBDs in the cryo-EM reconstructions (based on particles with a low and high FRET state) is 6 nm, which aligns with the dimensions of an SBD (length: ~6 nm, maximal width: ~4 nm). These findings collectively indicate that two SBDs interact but not necessarily in a singular conformation but possibly as an ensemble of weakly interacting states. Hence, we discuss three possible SBD-SBD interaction models to explain the highFRET state:

      Reviewer #2 (Recommendations For The Authors):

      In the abstract and elsewhere the authors suggest that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism, specifically for its cooperativity.

      I feel that this main claim is not well established. The authors convincingly demonstrate that the SBDs largely occupy two states relative to one another and that in one of these states, they are closer than in the other. Unless I have missed (or failed to understand) some major details of the results, I did not find any evidence of a physical interaction. Have the authors established that the high FRET state indeed corresponds to the physical engagement of the SBDs? I feel that a direct demonstration of an interaction is much missing.

      Along the same lines, in the low-salt cryo-EM structure, where the SBDs are relatively closer together, the SBDs are still separated and do not interact.

      See also our response to the final comment of reviewer 1. Furthermore, please carefully consider the following: (1) FRET values of 0.7-0.8 correspond to inter-dye distances of approximately 4 nm. (2) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (3) The cryo-EM reconstruction is the average of all the particles in the final dataset, including both the particles with a low and high FRET state. Further, the local resolution of the SBDs in the cryo-EM map is low, indicative of high degree of flexibility. Thus, a potential interaction is possible within the observed range of flexibility. (4) The distance between the density centers is 6 nm, aligning with the dimensions of an SBD (length: 6 nm, maximal width: 4 nm). These factors collectively indicate SBD interactions, and we present these points now more explicitly in Figure 4 and the last part of the results section (page 9).

      Once the authors successfully demonstrate that direct physical interaction indeed occurs, they will need to provide data that places it in the context of the transport cycle. Do the SBDs swap ligand molecules between them? Do they bind the ligand and/or the transporter cooperatively? What is the role of this interaction?

      We acknowledge the intriguing nature of the posed questions, but they extend beyond the scope of this study. It is extremely challenging to obtain high-resolution structures of highly dynamic multidomain proteins, like OpuA, and to probe transient interactions as we do here for the SBDs of OpuA. We therefore combined cryo-TEM with smFRET studies and perform the most advanced and state-of-theart analysis tools as acknowledged by reviewer 1. We link our observations on the structural dynamics and interactions of the SBDs to a previous study, where we showed that the two SBDs of OpuA interact cooperatively. We do not have further evidence that connect the physical interactions to the transport cycle. In our view, the collective datasets indicate that the here reported physical interactions between the SBDs increase the transport efficiency.

      As far as I understand, the smFRET data have been interpreted on the basis of a negative observation, i.e., that it is "likely" that none of the FRET states corresponds to a docked SBD. To convincingly show this, a positive observation is required, i.e., observation of a docked state.

      The aim of this study was to study interdomain dynamics and not specifically docking. We have previously shown that docking can be visualized via cryo-EM (Sikkema et al., 2020), however the SBDs of OpuA appear to only dock in specific turnover conditions. We now show that the high FRET state of OpuA cannot represent a docked state, but that the SBDs transiently interact (see our response to the first comment). Importantly, a docked state was also not found in the cryo-EM reconstructions at low ionic strength, representing the smFRET conditions where we observe the interactions between the SBDs. The high FRET state occupies 30% of the dwells in this condition, and such a high percentage of molecules would have become apparent during cryo-EM 3D classification in case they would form a docked state. Therefore, we conclude that docking does not occur in low ionic strength apo condition. We discuss this point and our reasoning on page 11 of the revised manuscript.

      In this respect, I find it troubling that in none of the tested conditions, the authors observed a FRET state which corresponds to the docked state. Such a state, which must exist for transport to occur (as mentioned in the authors' previous publications), needs to be demonstrated. This brings me to my next question: why have the authors not measured FRET between the SBDs and the transporter? Isn't this a very important piece that is missing from their puzzle?

      We agree that investigating docking behavior under varied turnover conditions requires focused experiments on FRET dynamics between the SBDs and the transporter. As noted on page 5, OpuA exists as a homodimer, implying that a single cysteine mutation introduces two cysteines in a single functional transporter. To specifically implement a cysteine mutation in only one SBD and one transmembrane domain, it is necessary to artificially construct a heterodimer. We recently published initial attempts in this direction, and this will be a subject for future research but still requires years of work.

      Additionally, I feel that important controls are missing. For example, how will the data presented in Fig1 look if the transporter is labeled with acceptor or donor only? How do soluble SBDs behave?

      In the employed labeling method, donor and acceptor dyes are mixed in a 1:1 ratio and randomly attached to the two cysteines in the transporter. This automatically yields significant fractions of donor only and acceptor only transporters which are always present during the smFRET recordings. We can visualize those molecules on the basis of the dye stoichiometry, which we calculate by using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      Unfiltered plots look as follows (a dataset of OpuA-K521C at 600 mM KCl):

      Author response image 1.

      Donor only and acceptor only molecules have a very well discernible stoichiometry of 1 and 0, respectively. The filtering procedure is described in the materials and methods section, and these plots can be found in the supplementary database. We did not add them to the main text or supplementary materials of the original manuscript, as this is a very common procedure in the field of smFRET. We now include such a dataset in the revised manuscript.

      Soluble SBDs of OpuA have been studied previously (e.g. Wolters et al., 2010 & De Boer et al. 2019). For example, we have shown by SEC-MALLLS that soluble SBDs do not form dimers, which is consistent with our notion that the SBDs interact with low affinity. It is not possible to study interdomain dynamics between soluble SBDs by smFRET, because the measurements are carried out at picomolar concentrations (monomeric conditions). We emphasize that smFRET measurements with native complexes, with SBDs near each other at apparent millimolar concentrations, is physiologically more relevant.

      Additional comments:

      (1) "It could well be that cooperativity and transient interactions between SBDs is more common than previously anticipated" and a similar statement in the abstract. What evidence is there to suggest that the transient interactions between SBDs are a common phenomenon?

      On page 11, we write: Dimer formation of SBPs has been described for a variety of proteins from different structural clusters of substrate-binding proteins [33–38,51–53]. We cite 9 papers that report SBD/SBP dimers. This suggest to us that the phenomenon of interacting substrate-binding proteins could be more common. Moreover, the concentration of maltose-binding protein and other SBPs in the periplasm of Gram-negative bacteria can reach (sub)millimolar concentrations, and low-affinity interactions may play a role not only in membrane protein-tethered SBDs (like in OpuA) but also be important in soluble substrate-receptors. Such low-affinity interactions are rarely studied in biochemical experiments.

      (2) I think that the data presented in 1B-C better suits the supplementary information.

      Figure 1B-D is already a summary of the supplementary information that describes the optimization of OpuA purification. We think it is valuable to show this part of the figure in the main text. A very clean and highly pure OpuA sample is essential for smFRET experiments. Quality of protein preparations and data analysis are key for the type of measurements we report in this paper.

      (3) "the first peak in the SEC profile corresponds...." The peaks should be numbered in the figure to facilitate their identification.

      We have changed the figure as suggested.

      (4) "smFRET is a powerful tool for studying protein dynamics, but it has only been used for a handful of membrane proteins". With the growing list of membrane proteins studied by smFRET I find this an overstatement.

      We removed this sentence in the new version of the manuscript.

      (5) "We rationalized that docking of one SBD could induce a distance shift between the two SBDs in the FRET range of 3-10 nm (Figure 1E)" How and why was this assumed?

      We realize that this is one of the sentences that caused confusion about the aim of this study. In this part of the manuscript, we should not have used docking as an example and we apologize for that. We replaced the sentence by: These variants are used to study inter-SBD dynamics in the FRET range of 310 nm (Figure 1E).

      Also Figure 1E was adjusted to prevent confusion:

      Author response image 2.

      In addition, to avoid any confusion we changed the following sentence on page 4 (changes marked in bold): We designed cysteine mutations in the SBD of OpuA to study interdomain dynamics in the full length transporter.

      (6) "However, the FRET distributions are broader than would be expected from a single FRET state, especially for OpuA-K521C" Have the authors established how a single state FRET of OpuA looks? Is there a control that supports this claim?

      Below we compare two datasets from OpuA-K521C in 600 mM KCl with a typical smFRET dataset from the well-studied substrate-binding protein MBP from E. coli, which resides in a single state. Left: OpuA-K521C; Right: MBP

      Author response image 3.

      We agree that this cannot be assumed from the presented data. Therefore we rewrote this sentence: However, the FRET distributions tail towards higher FRET values, especially OpuA-K521C.

      (7) "V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the intrinsic transport and ATP hydrolysis efficiency intact." I find this statement confusing: How can a mutation reduce docking efficiency yet leave the transport activity unchanged?

      We rewrote the sentences (changes marked in bold): V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the ionic strength sensing in the NBD and the binding of glycine betaine and ATP intact. Accordingly, a reduced docking efficiency should result in a lower absolute glycine betaine-dependent ATPase activity. At the same time the responsiveness of the system to varying KCl, glycine betaine, or Mg-ATP concentrations should not change.

      (8) Along the same lines: "whereas the glycine betaine-, Mg-ATP-, or KCl-dependent activity profiles remain unchanged" vs. "OpuA-V149Q-K521C exhibited a 2- to 3-fold reduction in glycine betainedependent ATPase activity".

      See comment at point 7.

      (9) In general, I find the writing wanting at places, not on par with the high standards set by previous publications of this group.

      We recognize the potential ambiguity in our phrasing. We hope that after incorporating the feedback provided by the reviewers our manuscript will convey our findings in a clearer manner.

      Extra changes to the text:

      (1) Title changed: The substrate-binding domains of the osmoregulatory ABC importer OpuA physically transiently interact

      (2) Second part of the abstract changed: We now show, by means of solution-based single-molecule FRET and analysis with multi-parameter photon-by-photon hidden Markov modeling, that the SBDs transiently interact in an ionic strength-dependent manner. The smFRET data are in accordance with the apparent cooperativity in transport and supported by new cryo-EM data of OpuA. We propose that the physical interactions between SBDs and cooperativity in substrate delivery are part of the transport mechanism.

      (3) Page 6, third paragraph and Figure 2B: the wrong rate number was extracted from table 1. Changed this in the text and figure: 112 s-1  173 s-1. It did not affect any of the interpretations or conclusions.

      (4) Page 8, last paragraph, changed: smFRET was also performed in the absence of KCl and with a saturating concentration of glycine betaine (100 µM). The mean FRET efficiency of the highFRET state of OpuA-K521C increased to 0.78, which corresponds to an inter-dye distance of about 4 nm. This indicates that the dyes at the two SBDs move very close towards each other (Figure 4A) (Table 1) (Supplementary File 34).

      (5) Page 9, second paragraph changed: Due to the inherent flexibility of the SBDs, with respect to both the MSP protein of the nanodisc and the TMDs of OpuA, their resolution is limited. Furthermore, the cryo-EM reconstructions average all the particles in the final dataset, including those with a low and high FRET state. Nevertheless, in both conditions, the densities that correspond to the SBDs can be observed in close proximity (Figure 4D). The distance between the density centers is 6 nm and align with the dimensions of an SBD, providing further evidence for physical interactions between the SBDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful to the Editors for overseeing the review of our manuscript, and to the two reviewers for their thoughtful comments and suggestions for how it can be improved.

      I submit at this time a revision, as well as a detailed response (below) to each of the points raised in the first round of review.

      We feel the manuscript has been significantly improved by taking the reviewers' comments to heart. In a nutshell, we added new key pieces of data (impact of WIN site inhibition on global translation, rRNA production, as well as the requested cell biology analyses showing nucleolar stress), new analyses of the proteomics to counter potential concerns with normalization, and expanded/revised verbiage in key areas to clarify parts of the text that were confusing or problematic. The main figures have not changed; all new material is included in supplements to figures 2 and 3.

      Public Reviews

      Reviewer #1 (Public Review):

      Building on previous work from the Tansey lab, here Howard et al. characterize transcriptional and translational changes upon WIN site inhibition of WDR5 in MLL-rearranged cancer cells. They first analyze whether C16, a newer generation compound, has the same cellular effects as C6, an early generation compound. Both compounds reduce the expression of WDR5-bound RPGs in addition to the unbound RPG RPL22L1. They then investigate differential translation by ribo-seq and observe that WIN site inhibition reduces the translational RPGs and other proteins related to biomass accumulation (spliceosome, proteasome, mitochondrial ribosome). Interestingly, this reduction adds to the transcriptional changes and is not limited to RPGs whose promoters are bound by WDR5. Quantitative proteomics at two-time points confirmed the downregulation of RPGs. Interestingly, the overall effects are modest, but RPL22LA is strongly affected. Unexpectedly, most differentially abundant proteins seem to be upregulated 24 h after C6 (see below). A genetic screen showed that loss of p53 rescues the effect of C6 and C16 and helped the authors to identify pathways that can be targeted by compounds together with WIN site inhibitors in a synergistic way. Finally, the authors elucidated the underlying mechanisms and analyzed the functional relevance of the RPL22, RPL22L1, p53, and MDM4 axis.

      While this work is not conceptually new, it is an important extension of the observations of Aho et al. The results are clearly described and, in my view, very meaningful overall.

      Major points:

      (1) The authors make statements about the globality/selectivity of the responses in RNA-seq, ribo-seq, and quantitative proteomics. However, as far as I can see, none of these analyses have spike-in controls. I recommend either repeating the experiments with a spike-in control or carefully measuring transcription and translation rates upon WIN site inhibition and normalizing the omics experiments with this factor.

      The reviewer is correct that we did not include spike-in controls in our omics experiments. We would like to emphasize that none of the omics data in this manuscript have been processed in unorthodox ways, and that the major conclusions each have independent corroborating data.

      The selectivity in RPG suppression observed in RNA-Seq, for example, is supported by results from our target engagement (QuantiGene) assays; suppression of RPL22L1 mRNA levels is supported by quantitative and semi-quantitative RT-PCR, by western blotting, and by the results of our proteomic profiling; alternative splicing (and expression) of MDM4—and its dependency on RPL22—is also backed up by similar RT-PCR and western blotting data. The same applies for alternative splicing of RPL22L1.

      That said, we do appreciate the point the reviewer is making here, and have done our best to respond. We do not think it is a prudent investment in resources to repeat the numerous omics assays in the manuscript. We also considered normalizing for bulk transcription and translation rates as suggested, but it is not clear in practice how this would be done, and it could introduce additional variables and uncertainties that may skew the interpretation of results. Instead, to respond to this comment, we made the following changes to the manuscript:

      (1) We now explicitly state, for all omics assays, that spike-in controls were not included. These statements will prompt the reader to make their own assessment of the robustness of each of our findings and interpretations.

      (2) We have added new data to the manuscript (Figure 2—figure supplement 1A–B) measuring the impact of C6 and C16 on bulk translation using the OPP labeling method. These new data demonstrate that WIN site inhibitors induce a progressive yet modest decline in protein synthesis capacity. At 24 hours, there is no significant effect of either agent on protein synthesis levels. By 48 hours, a small but significant effect is observed, and by 96 hours translation levels are ~60% of what they are in vehicle-treated control cells. These new data are important because they support the idea that normalization has not blunted the responses we observe—the magnitude of the effects are consistent between the different assays and tend to cap out at two-fold in terms of RPG suppression, translation efficiency, ribosomal protein levels, and protein synthesis capacity.

      (3) We have included additional analysis regarding the LFQMS, as described below, that specifically addresses the issue of normalization in our proteomics experiments.

      (2) Why are the majority of proteins upregulated in the proteomics experiment after 24 h in C6 (if really true after normalization with general protein amount per cell)? This is surprising and needs further explanation.

      The reviewer is correct in noting that (by LFQMS) ~700 proteins are induced after 24 hours of treatment of MV4:11 cells with C16 (not C6, as stated). The reviewer would like us to examine whether this apparent increase in proteins is a normalization artifact. In response to this comment, we have made the following changes to the manuscript:

      (1) Our new OPP labeling experiments (Figure 2—figure supplement 1A–B) show that there is no significant reduction in overall protein synthesis following 24 hours of C16 treatment. In light of this finding, it is unlikely that normalization artifacts, resulting from diminution of the pool of highly abundant proteins, create the appearance of these 700 proteins being induced. We now explicitly make this point in the text.

      (2) We now clarify in the methods how we seeded identical numbers of cells for DMSO and C16-treated cultures in these experiments, and—consistent with our finding that WIN site inhibitors have little if any effect on protein synthesis or proliferation at the 24 hour timepoint— extracted comparable amounts of proteins from these two treatment conditions (DMSO: 344.75 ± 21.7 µg; C16: 366.50 ± 15.8 µg; [Mean ± SEM]).

      (3) We now include in Figure 3—figure supplement 1A a plot showing the distribution of peptide intensities for each protein detected in each run of LFQMS before and after equal median normalization. This new analysis reveals that the distribution of intensities is not appreciably changed via normalization. Specifically, there is not a reduction in peptide intensities in the unnormalized data from 24 hours of C16 treatment that is reversed or tempered by normalization. This analysis provides further support for the notion that the increase we observe is not a normalization artifact.

      (4) We now include in Figure 3—figure supplement 1B–D a set of new analyses examining the relationship between the initial intensity of proteins in DMSO control samples (a crude proxy for abundance) versus the fold change in response to WIN site inhibitor. This analysis shows that we have as many "highly abundant" (10th decile) proteins increasing as we do decreasing in response to WINi. Thus, it appears as though the wholesale clearance of highly abundant proteins from the cell is not occurring at this early treatment timepoint. In addition, this analysis also shows that ribosomal proteins (RP) are generally the most abundant, most suppressed, proteins and that their fold-change at the protein level at 24 hours is less than two-fold, consistent again with the magnitude of transcriptional effects of C16, as measured by RNA-Seq and QuantiGene. The fact that the drop in RP levels is consistent with expectations based on other analyses provides further empirical support for the notion that protein levels inferred from LFQMS are authentic and not skewed by global changes in the proteome.

      The increase in proteins at this time point, we argue, is thus most likely genuine. It is not surprising that—at a timepoint at which protein synthesis is unaffected—several hundred proteins are induced by a factor of two. How this occurs, we do not know. It may be a transient compensatory mechanism, or it may be an early part of the active response to WIN site inhibitors. Lest the reader be confused by this finding, we have now added text to this section of the manuscript discussing and explaining the phenomenon in more detail.

      (3) The description of the two CRISPR screens (GECKO and targeted) is a bit confusing. Do I understand correctly that in the GECKO screen, the treated cells are not compared with nontreated cells of the same time point, but with a time point 0? If so, this screen is not very meaningful and perhaps should be omitted. Also, it is unclear to me what the advantages of the targeted screen are since the targets were not covered with more sgRNAs (data contradictory: 4 or 10 sgRNAs per target?) than in Gecko. Also, genome-wide screens are feasible in culture for multiple conditions. Overall, I find the presentation of the screening results not favorable.

      In essence, this is a single screen performed in two tiers. In Tier 1, we screened a complete GECKO library (six sgRNA/gene) with the earliest generation (less potent) inhibitor C6, and compared sgRNA representation against the time zero population. This screen would reveal sgRNAs that are specifically associated with response to C6, as well as those that are associated with general cell fitness and viability. We then identified genes connected to these sgRNAs, removed those that are pan essential, and built a custom library for the second tier using sgRNAs from the Brunello library (four sgRNA/gene). We then screened this custom library with both C6 and the more potent inhibitor C16, this time against DMSO-treated cells from the same timepoint.

      We acknowledge that this is not the most streamlined setup for a screen. But our intention was to compare two inhibitors (C6 and C16) and identify high confidence 'hits' that are disconnected from general cell viability, rather than generate an exhaustive list of all genes that, when disrupted, skew the response to WIN site inhibitor. The final result of this screen (Figure 4E) is a gene list that has been validated with two chemically distinct WIN site inhibitors and up to 10 unique sgRNAs per gene. We may not have captured every gene that can modulate response to WIN site inhibitor, but those appearing in Figure 4E are highly validated.

      To answer the reviewer's specific questions: (i) we cannot omit the Tier 1 screen because then there would be no rationale for what was screened in the second Tier; and (ii) the advantage of the custom Tier 2 library is that it allowed us to screen hits from the Tier 1 screen with four completely independent sgRNAs. Although there are not more sgRNAs for each gene in the Tier 2 versus the Tier 1 library, these sgRNAs are different and thus, for C6 at least, hits surviving both screens were validated with up to 10 unique sgRNAs.

      We apologize that the description of the CRISPR screens was not clearer, and have reworked this section of the manuscript to make our intent and our actions clearer.

      (4) Can Re-expression of RPL22 rescue the growth arrest of C6?.

      We have not attempted to complement the RPL22 knock out. But we do note that evidence supporting the idea that loss of RPL22 confers resistance to WIN site inhibitor is strong—six (out of six) sgRNAs against RPL22 were significantly enriched in the Tier 1 screen, and independent knock out of RPL22 with the Synthego multi-guide system in MV4;11 and MOLM13 cells increases the GI50 for C16.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Howard et al reports the development of high-affinity WDR5-interaction site inhibitors (WINi) that engage the protein to block the arginine-dependent engagement with its partners. Treatment of MLL-rearranged leukemia cells with high-affinity WINi (C16) decreases the expression of genes encoding most ribosomal proteins and other proteins required for translation. Notably, although these targets are enriched for WDR5-ChIP-seq peaks, such peaks are not universally present in the target genes. High concordance was found between the alterations in gene expression due to C16 treatment and the changes resulting from treatment with an earlier, lower affinity WINi (C6). Besides protein synthesis, genes involved in DNA replication or MYC responses are downregulated, while p53 targets and apoptosis genes are upregulated. Ribosome profiling reveals a global decrease in translational efficiency due to WINi with overall ribosome occupancies of mRNAs ~50% of control samples. The magnitude of the decrements of translation for most individual mRNAs exceeds the respective changes in mRNA levels genome-wide. From these results and other considerations, the authors hypothesize that WINi results in ribosome depletion. Quantitative mass spec documents the decrement in ribosomal proteins following WINi treatment along with increases in p53 targets and proteins involved in apoptosis occurring over 3 days. Notably, RPL22L1 is essentially completely lost upon WINi treatment. The investigators next conduct a CRISPR screen to find moderators and cooperators with WINi. They identify components of p53 and DNA repair pathways as mediators of WINi-inflicted cell death (so gRNAs against these genes permit cell survival). Next, WINi are tested in combination with a variety of other agents to explore synergistic killing to improve their expected therapeutic efficacy. The authors document the loss of the p53 antagonist MDM4 (in combination with splicing alterations of RPL22L1), an observation that supports the notion that WINi killing is p53-mediated.

      Strengths:

      This is a scientifically very strong and well-written manuscript that applies a variety of state-ofthe art molecular approaches to interrogate the role of the WDR5 interaction site and WINi. They reveal that the effects of WINi seem to be focused on the overall synthesis of protein components of the translation apparatus, especially ribosomal proteins-even those that do not bind WDR5 by ChIP (a question left unanswered is how much the WDR5-less genes are nevertheless WINi targeted). They convincingly show that disruption of the synthesis of these proteins is accompanied by DNA damage inferred by H2AX-activation, activation of the p53pathway, and apoptosis. Pathways of possible WINi resistance and synergies with other antineoplastic approaches are explored. These experiments are all well-executed and strongly invite more extensive pre-clinical and translational studies of WINi in animal studies. The studies also may anticipate the use of WINi as probes of nucleolar function and ribosome synthesis though this was not really explored in the current manuscript.

      Weaknesses:

      A mild deficiency in the current manuscript is the absence of cell biological methods to complement the molecular biological and biochemical approaches so ably employed. Some microscopic observations and confirmation of nucleolar dysfunction and DNA damage would be reassuring.

      We thank the reviewer for their comments. We agree that an absence of cell biological methods was a deficiency in the original manuscript. In response to this comment, we have now added immunofluorescence (IF) analyses, examining the impact of C16 on nucleolar integrity and nucleophosmin (NPM1) distribution (Figure 3—figure supplement 4). These new data clearly show that C16 induces nucleolar stress at 72 hours—as measured by the redistribution of NPM1 from the nucleolus to the nucleoplasm. These new data fill an important gap in the story, and we are grateful to the reviewer for prompting us to perform these experiments.

      As part of the above study, we also probed for gamma-H2AX, expecting that we may see some signs of accumulation in the nucleoli (see comment #4 from Reviewer #2, below). We did not observe this response. Importantly, however, we did see that gamma-H2AX staining occurs only in what are overtly apoptotic cells. This is an important finding, because we had previously speculated that the induction of gamma-H2AX observed by Western blotting reflected part of a bona-fide response to DNA damage elicited by WIN site inhibitors. Instead, the IF data now leads us to conclude that this signal simply reflects the established fact that WIN site inhibitors induce apoptosis in this cell line (Aho et al., 2019). In response to this new finding, we have added additional discussion to the text and have removed or de-emphasized the potential contribution of DNA damage to the mechanism of action of WDR5 WIN site inhibitors. Again, we are grateful for this comment as it has prevented us from continuing to report/pursue erroneous observations.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      There is a typo in "but are are linked to mRNA instability when translation is inhibited".

      Thank you for catching this typo. It has now been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors report that WINi initially (at 24 hrs) increases the expression of most proteins while decreasing ribosomal proteins, but at 72 hours all proteins are depressed. The transient bump-up of non-translation-related proteins seems odd. A simple resolution to this somewhat strange observation is that there is no real increase in the other proteins, but because of the loss of a large fraction of the most abundant cellular proteins (the ribosomal proteins), the relative fraction of all other proteins is increased; that is, the increase of non-ribosomal proteins may be an artifact of normalization to a lower total protein content. Can this be explored?

      We are grateful to the reviewer for this comment. We have tried our best to respond, as detailed above in response to Reviewer #1 Public Comment #2.

      (2) It would be really nice to assess nucleolar status microscopically. Do nucleoli get bigger? Smaller? Do they have abnormal morphology? Is there nucleolar stress? What happens to rRNA synthesis and processing?

      We agree and thank the reviewer for raising this point. As noted in our response to Reviewer #2, above, we have included new IF that shows: (i) no obvious effect on nucleolar integrity, (ii) redistribution of NPM1 to the nucleoplasm (indicative of nucleolar stress), and (iii) induction of gamma-H2AX staining in apoptotic cells (indicative of apoptosis).

      Additionally, in response to this comment, we also looked at the impact of WIN site inhibitors on rRNA synthesis, using AzCyd labeling. These new data appear in Figure 3—figure supplement 3. Interestingly, these new data show that there is a progressive decline in rRNA synthesis, and that by 96 hours of treatment levels of both 18S and 28S rRNAs are reduced— again by about a factor of two. Our interpretation of this finding is that in response to the progressive decline in RPG transcription there is a secondary decrease in rRNA synthesis. This result is perhaps not surprising, but it does again add an important missing piece to our characterization of WIN site inhibitors and is further support for the concept that inhibition of ribosome production is a dominant part of the response to these agents.

      (3) The WINi elicited DNA damage is incompletely characterized, rather it is inferred from H2AX activation. Comet assays would help to confirm such damage.

      As noted in our response to Reviewer #2, our original inference of DNA damage, prompted by gamma-H2AX activation, is erroneous, and due instead to the ability of WIN site inhibitors to induce apoptosis. We thus did not pursue comet assays, etc., and removed discussion of potential DNA damage from the manuscript.

      (4) Staining and microscopic observation of H2AX would be very useful. Is the WINi provoked DNA damage nucleolar-localized? Does the deficiency of ribosomal proteins lead to localized genotoxic nucleolar stress - or alternatively does the paucity of ribosomes and decreased translation lead to imbalances in other cellular pathways, perhaps including some involved in overall genome maintenance which would provoke more global DNA damage and H2AX staining, not limited to the nucleolus.

      Again, please see our response to the Public Comment from Reviewer #2.

      (5) It would be important to assess the influence and effects of WINi on some p53 mutant, p53-/- and p53 wild-type cell lines. Given their prevalence, p53 status may be expected to alter WINi efficacy.

      The issue of how p53 status impacts the response to WINi is interesting and important, but we feel this is beyond the scope of the current manuscript. It is likely that many factors contribute to the response of cancer cells to these agents, and thus simply surveying some cancer lines for their response and linking this to their p53 status is unlikely to be very informative. Making definitive statements about the contribution of p53, and the differences between wild-type, lossof-function mutants, gain of function mutants, and null mutants will require more extensive analyses and is fertile territory for future studies, in our opinion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a useful study examining the determinants and mechanisms of LRMP inhibi:on of cAMP regula:on of HCN4 channel ga:ng. The evidence provided to support the main conclusions is unfortunately incomplete, with discrepancies in the work that reduce the strength of mechanis:c insights.

      Thank you for the reviews of our manuscript. We have made a number of changes to clarify our hypotheses in the manuscript and addressed all of the poten:al discrepancies by revising some of our interpreta:on. In addi:on, we have provided addi:onal experimental evidence to support our conclusions. Please see below for a detailed response to each reviewer comment.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.

      Strengths:

      The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.

      Thank you.

      Reviewer #2 (Public Review):

      Summary:

      HCN-4 isoform is found primarily in the sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit that prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows that parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.

      Weaknesses:

      (1) Figure 5A- I am a bit confused with this figure and perhaps it needs better labeling. When it states Citrine, does it mean just free Citrine, and "LRMP 1-230" means LRMP fused to Citrine which is an "LF" construct? Why not simply call it "LF"? If there is no Citrine fused to "LRMP 1-230", this figure would not make sense to me.

      We have clarified the labelling of this figure and specifically defined all abbreviations used for HCN4 and LRMP fragments in the results section on page 14.

      (2) Related to the above point- Why is there very little FRET between NF and LRMP 1-230? The FRET distance range is 2-8 nm which is quite large. To observe baseline FRET for this construct more explanation is required. Even if one assumes that about 100 amino are completely disordered (not extended) polymers, I think you would still expect significant FRET.

      FRET is extremely sensitive to distance (to the 6th power of distance). The difference in contour length (maximum length of a peptide if extended) between our ~260aa fragment and our ~130 aa fragments is on the order of 450Å (45nm), So, even if not extended it is not hard to imagine that the larger fragments show a weaker FRET signal. In fact, we do see a slightly larger FRET than we do in control (not significant) which is consistent with the idea that the larger fragments just do not result in a large FRET.

      Moreover, this hybridization assay is sensitive to a number of other factors including the affinity between the two fragments, the expression of each fragment, and the orientation of the fluorophores. Any of these factors could also result in reduced FRET.

      We have added a section on the limitations of the FRET 2-hybrid assay in the discussion section on page 20. Our goal with the FRET assay was to provide complimentary evidence that shows some of the regions that are important for direct association and we have edited to the text to make sure we are not over-interpreting our results.

      (3) Unless I missed this, have all the Cerulean and Citrine constructs been tested for functional activity?

      All citrine-tagged LRMP constructs (or close derivatives) were tested functionally by coexpression with HCN (See Table 1 and pages 10-11). Cerulean-tagged HCN4 fragments are of course intrinsically not-functional as they do not include the ion conducting pore.

      Reviewer #3 (Public Review):

      Summary:

      Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.

      Strengths:

      Based on these data, the authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific N-terminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.

      Weaknesses:

      Although the work is interesting, there are some discrepancies between data that need to be addressed.

      (1) I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.

      Thank you, Δ shift values have been added to Tables 1 and 2 as suggested.

      (2) Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).

      With a large N, this is an expected distribution. In 5 previous reports from 4 different groups of HCN4 with cAMP in HEK 293 (Fenske et al., 2020; Liao et al., 2012; Peters et al., 2020; Saponaro et al., 2021; Schweizer et al., 2010), the average expected range of the data is 26.6 mV and 39.9 mV for 95% (mean ± 2SD) and 99% (mean ± 3SD) of the data, respectively. As the reviewer mentions the expected range from these papers is slightly larger in the absence of cAMP. The average SD of HCN4 (with/without cAMP) in papers are 9.9 mV (Schweizer et al., 2010), 4.4 mV (Saponaro et al., 2021), 7.6 mV (Fenske et al., 2020), 10.0 mV (Liao et al., 2012), and 5.9 mV (Peters et al., 2020). Our SD in this paper is roughly in the middle at 7.6 mV. This is likely because we used an inclusive approach to data so as not to bias our results (see the statistics section of the revised manuscript on page 9). We have removed 2 data points that meet the statistical classification as outliers, no measures of statistical significance were altered by this.

      This problem is spread throughout the manuscript, and the measured mean effects are indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?

      The exact P-values are NOT typically at the limit of statistical significance, about 2/3rds would meet the stringent P < 0.0001 cut-off. We have clarified in the statistics section (page 10) that any comparison meeting our significance threshold (P < 0.05) or a stricter criterion is treated equally in the figure labelling. Exact P-values are provided in Tables 1-3.

      There are several other problems with Figure 1 and in all figures of the manuscript: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Figure 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.

      Y-axis values for mean plots were picked such that all data points are included and are consistent across all figures. They have been expanded slightly (-75 to -145 mV for all HCN4 channels and -65 to -135 mV for all HCN2 channels). The size of the mean value marker has been reduced slightly. Exact midpoints for all data are also found in Tables 1-3.

      The GV curves in Figure 1B (previously Fig. 1A) are averages with the ±SEM error bars smaller than the symbols in many cases owing to relatively high n’s for these datasets. These curves match the midpoints in panel 1C (previously 1B). Eg. the midpoint of the average curve for HCN4 control in panel A is -117.9 mV, the same as the -117.8 mV average for the individual fits in panel B.

      We made an error in the text based on a previous manuscript version about the ordering of the tables that has now been fixed so these values should now be aligned.

      On this ground, it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would be also shown.

      Exemplary current traces have been added to all figures in the revised manuscript.

      (3) "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and Nterminus at the cAMP transduction centre."

      Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker, and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, remains to be proven.

      Moreover, I suggest analysing the putative role of P545 and T547 in light of the available HCN4 structures. In particular, T547 (elbow) points towards the underlying shoulder of the adjacent subunit and, therefore, is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 also favours this hypothesis. In this light, it will be also interesting to see whether a single T547F mutation is sufficient to prevent the LRMP effect.

      We agree that testing this hypothesis would be very interesting. However, it is challenging. Any mutation we make that is involved in cAMP transduction makes measuring the LRMP effect on cAMP shifts difficult or impossible.

      Our simple idea, now clarified in the discussion, is that if you look at the regions involved in cAMP transduction (HCND, C-linker, S4-S5), there are very few residues that differ between HCN4 and HCN2. When we mutate the 5 non-conserved residues in the S5 segment and the C-linker, along with the NT, we are able to render HCN2 sensitive to LRMP. Therefore, something about the small sequence differences in this region confer isoform specificity to LRMP. We speculate that this happens because of small structural differences that result from those 5 mutations. If you compare the solved structures of HCN1 and HCN4 (there is no HCN2 structure available), you can see small differences in the distances between key interacting residues in the transduction centre. Also, there is a kink at the bottom of the S4 helix in HCN4 but not HCN1. This points a putatively important residue for cAMP dependence in a different direction in HCN4. We hypothesize in the discussion that this may be how LRMP is isoform specific.

      Moreover, previous work has shown that the HCN4 C-linker is uniquely sensitive to di-cyclic nucleotides and magnesium ions. We are hypothesizing that it is the subtle change in structure that makes this region more prone to regulation in HCN4.

      Reviewing Editor (recommendations for the Authors):

      (1) Exemplar recordings need to be shown and some explanation for the wide variability in the V-half of activation.

      Exemplar currents are now shown for each channel. See the response to Reviewer 3’s public comment 2.

      (2) The rationale for cut sites in LRMP for the investigation of which parts of the protein are important for blocking the effect of cAMP is not logically presented in light of the modular schematics of domains in the protein (N-term, CCD, post-CCD, etc).

      There is limited structural data on LRMP and the HCN4 N-terminus. The cut sites in this paper were determined empirically. We made fragments that were small enough to work for our FRET hybridization approach and that expressed well in our HEK cell system. The residue numbering of the LRMP modules is based on updated structural predictions using Alphafold, which was released after our fragments were designed. This has been clarified in the methods section on pages 5-6 and the Figure 2 legend of the revised manuscript.

      (3) Role of the HCN4 C-terminus. Truncation of the HCN4 C-terminus unstructured Cterminus distal to the CNBD (Fig. 4 A, B) partially reverses the impact of LRMP (i.e. there is now a significant increase in cAMP effect compared to full-length HCN4). The manuscript is written in a manner that minimizes the potential role of the C-terminus and it is, therefore, eliminated from consideration in subsequent experiments (e.g. FRET) and the discussion. The model is incomplete without considering the impact of the C-terminus.

      We thank the reviewer for this comment as it was a result that we too readily dismissed. We have added discussion around this point and revised our model to suggest that not only can we not eliminate a role for the distal C-terminus, our data is consistent with it having a modest role. Our HCN4-2 chimera and HCN4-S719x data both suggest the possibility that the distal C-terminus might be having some effect on LRMP regulation. We have clarified this in the results (pages 12-13) and discussion (page 19).

      (4) For FRET experiments, it is not clear why LF should show an interaction with N2 (residues 125-160) but not NF (residues 1-160). N2 is contained within NF, and given that Citrine and Cerulean are present on the C-terminus of LF and N2/NF, respectively, residues 1-124 in NF should not impact the detection of FRET because of greater separation between the fluorophores as suggested by the authors.

      This is a fair point but FRET is somewhat more complicated. We do not know the structure of these fragments and it’s hard to speculate where the fluorophores are oriented in this type of assay. Moreover, this hybridization assay is sensitive to affinity and expression as well. There are a number of reasons why the larger 1-260 fragment might show reduced FRET compared to 125-260. As mentioned in our response to reviewer 2’s public comment 2, we have added a limitation section that outlines the various caveats of FRET that could explain this.

      (5) For FRET experiments, the choice of using pieces of the channel that do not correlate with the truncations studied in functional electrophysiological experiments limits the holistic interpretation of the data. Also, no explanation or discussion is provided for why LRMP fragments that are capable of binding to the HCN4 N-terminus as determined by FRET (e.g. residues 1-108 and 110-230, respectively) do not have a functional impact on the channel.

      As mentioned in the response to comment 2, the exact fragment design is a function of which fragments expressed well in HEK cells. Importantly, because FRET experiments do not provide atomic resolution for the caveats listed in the revised limitations section on page 20-21, small differences in the cut sites do not change the interpretation of these results. For example, the N-terminal 1-125 construct is analogous to experiments with the Δ1-130 HCN4 channel.

      We suspect that residues in both fragments are required and that the interaction involves multiple parts. This is stated in the results “Thus, the first 227 residues of LRMP are sufficient to regulate HCN4, with residues in both halves of the LRMP N-terminus necessary for the regulation” (page 11). We have also added discussion on this on page 21.

      (6) A striking result was that mutating two residues in the C-linker of HCN4 to amino acids found in HCN channels not affected by LRMP (P545A, T547F), completely eliminated the impact of LRMP on preventing cAMP regulation of channel activation. However, a chimeric channel, (HCN4-2) in which the C-linker, the CNBD, and the C-terminus of HCN4 were replaced by that of HCN2 was found to be partially responsive to LRMP. These two results appear inconsistent and not reconciled in the model proposed by the authors for how LRMP may be working.

      As stated in our answer to your question #3, we have revised our interpretation of these data. If the more distal C-terminus plays some role in the orientation of the C-linker and the transduction centre as a whole, these data can still be viewed consistent with our model. We have added some discussion of this idea in our discussion section.

      (7) Replacing the HCN2 N-terminus with that from HCN4, along with mutations in the S5 (MCS/VVG) and C-linker (AF/PT) recapitulated LRMP regulation on the HCN2 background. The functional importance of the S5 mutations is not clear as no other experiments are shown to indicate whether they are necessary for the observed effect.

      We have added our experiments on a midpoint HCN2 clone that includes the S5 mutants and the C-linker mutants in the absence of the HCN4 N-terminus (ie HCN2 MCSAF/VVGPT) (Fig. 7). And we have discussed our rationale for the S5 mutations as we believe they may be responsible for the different orientations of the S4-S5 linker in HCN1 and HCN4 structures that are known to impact cAMP regulation.

      Reviewer #1 (Recommendations For The Authors):

      A) Comments:

      (1) Figure 1: Please show some representative current traces.

      Exemplar currents are now shown for each channel in the manuscript.

      (2) Figure 1: There appears to be a huge number of recordings for HCN4 +/- cAMP as compared to those with LRMP 1-479Cit. How was the number of recordings needed for sufficient statistical power decided? This is particularly important because the observed slowing of deactivation by cAMP in Fig. 1C seems like it may be fairly subtle. Perhaps a swarm plot would make the shift more apparent? Also, LRMP 1-479Cit distributions in Fig. 1B-C look like they are more uniform than normal, so please double-check the appropriateness of the statistical test employed.

      We have revised the methods section (page 7) to discuss this, briefly we performed regular control experiments throughout this project to ensure that a normal cAMP response was occurring. Our minimum target for sufficient power was 8-10 recordings. We have expanded the statistics section (page 9) to discuss tests of normality and the use of a log scale for deactivation time constants which is why the shifts in Fig. 1D (revised) are less apparent.

      (3) It would be helpful if the authors could better introduce their logic for the M338V/C341V/S345G mutations in the HCN4-2 VVGPT mutant.

      See response to the reviewing editor’s comment 7.

      B) Minor Comments:

      (1) pg. 9: "We found that LRMP 1-479Cit inhibited HCN4 to an even greater degree than the full-length LRMP, likely because expression of this tagged construct was improved compared to the untagged full-length LRMP, which was detected by co-transfection with GFP." Co-transfection with GFP seems like an extremely poor and a risky measure for LRMP expression.

      We agree that the exact efficiency of co-transfection is contentious although some papers and manufacturer protocols indicate high co-transfection efficiency (Xie et al., 2011). In this paper we used both co-transfection and tagged proteins with similar results.

      (2) pg 9: "LRMP 1-227 construct contains the N-terminus of LRMP with a cut-site near the Nterminus of the predicted coiled-coil sequence". In Figure 2 the graphic shows the coiledcoil domain starting at 191. What was the logic for splitting at 227 which appears to be the middle of the coiled-coil?

      See response to the reviewing editor’s comment 2.

      (3) Figure 5C: Please align the various schematics for HCN4 as was done for LRMP. It makes it much easier to decipher what is what.

      Fig. 5 has been revised as suggested.

      (4) pg 12: I assume that the HCN2 fragment chosen aligns with the HCN4 N2 fragment which shows binding, but this logic should be stated if that is the case. If not, then how was the HCN2 fragment chosen?

      This is correct. This has been explicitly stated in the revised manuscript (page 14).

      (5) Figure 7: Add legend indicating black/gray = HCN4 and blue = HCN2.

      This has been stated in the revised figure legend.

      (6) pg 17: Conservation of P545 and T547 across mammalian species is not shown or cited.

      This sentence is not included in the revised manuscript, however, for the interest of the reviewer we have provided an alignment of this region across species here.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is not clear whether in the absence of cAMP, LRMP also modestly shifts the voltagedependent activity of the channels. Please clarify.

      We have clarified that LRMP does not shift the voltage-dependence in the absence of cAMP (page 10). In the absence of cAMP, LRMP does not significantly shift the voltagedependence of activation in any of the channels we have tested in this paper (or in our prior 2020 paper).

      (2) Resolution of Fig. 8b is low.

      We ultimately decided that the cartoon did not provide any important information for understanding our model and it was removed.

      (3) Please add a supplementary figure showing the amino acid sequence of LRMP to show where the demarcations are made for each fragment as well as where the truncations were made as noted in Fig 3 and Fig 4.

      A new supplementary figure showing the LRMP sequence has been added and cited in the methods section (page 5). Truncation sites have been added to the schematic in Fig. 2A.

      (4) In the cartoon schematic illustration for Fig. 3 and Fig.4, the legend should include that the thick bold lines in the C-Terminal domain represent the CNBD, while the thick bold lines in the N-Terminal domain represent the HCN domain. This was mentioned in Liao 2012, as you referenced when you defined the construct S719X, but it would be nice for the reader to know that the thick bold lines you have drawn in your cartoon indicate that it also highlights the CNBD or the HCN domain.

      This has been added to figure legends for the relevant figures in the revised manuscript.

      (5) On page 12, missing a space between "residues" and "1" in the parenthesis "...LRMP L1 (residues1-108)...".

      Fixed. Thank you.

      (6) Which isoform of LRMP was used? What is the NCBI accession number? Is it the same one from Peters 2020 ("MC228229")?

      This information has been added to the methods (page 5). It is the same as Peters 2020.

      Reviewer #3 (Recommendations For The Authors):

      (1) "Truncation of residues 1-62 led to a partial LRMP effect where cAMP caused a significant depolarizing shift in the presence of LRMP, but the activation in the presence of LRMP and cAMP was hyperpolarized compared to cAMP alone (Fig. 3B, C and 3E; Table 1). In the HCN4Δ1-130 construct, cAMP caused a significant depolarizing shift in the presence of LRMP; however, the midpoint of activation in the presence of LRMP and cAMP showed a non-significant trend towards hyperpolarization compared to cAMP alone (Fig. 3C and 3E; Table 1)".

      This means that sequence 62-185 is necessary and sufficient for the LRMP effect. I suggest a competition assay with this peptide (synthetic, or co-expressed with HCN4 full-length and LRMP to see whether the peptide inhibits the LRMP effect).

      We respectfully disagree with the reviewer’s interpretation. Our results, strongly suggest that other regions such as residues 25-65 (Fig. 3C) and C-terminal residues (Fig. 6) are also necessary. The use of a peptide could be an interesting future experiment, however, it would be very difficult to control relative expression of a co-expressed peptide. We think that our results in Fig. 7E-F where this fragment is added to HCN2 are a better controlled way of validating the importance of this region.

      (2) "Truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation. In the presence of both LRMP and cAMP the activation of HCN4-S719X was still significantly hyperpolarized compared to the presence of cAMP alone (Figs. 4A and 4B; Table 1). And the cAMP-induced shift in HCN4-S719X in the presence of LRMP (~7mV) was less than half the shift in the absence of LRMP (~18 mV)."

      On the basis of the partial effects reported for the truncations of the N-terminus of HCN4 162 and 1-130 (Fig 3B and C), I do not think it is possible to conclude that "truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation". Indeed, cAMP-induced shift in HCN4 Δ1-62 and Δ1-130 in the presence of LRMP were 10.9 and 10.5 mV, respectively, way more than the ~7mV measured for the HCN4-S719X mutant.

      As you rightly stated at the end of the paragraph:" Together, these results show significant LRMP regulation of HCN4 even when the distal C-terminus is truncated, consistent with a minimal role for the C-terminus in the regulatory pathway". I would better discuss this minimal role of the C-terminus. It is true that deletion of the first 185 aa of HCN4 Nterminus abolishes the LRMP effect, but it is also true that removal of the very Cterm of HCN4 does affect LRMP. This unstructured C-terminal region of HCN4 contains isotype-specific sequences. Maybe they also play a role in recognizing LRMP. Thus, I would suggest further investigation via truncations, even internal deletions of HCN4-specific sequences.

      Please see the response to the reviewing editor’s comment 3.

      (3) Figure 5: The N-terminus of LRMP FRETs with the N-terminus of HCN4.

      Why didn't you test the same truncations used in Fig. 3? Indeed, based on Fig 3, sequences 1-25 can be removed. I would have considered peptides 26-62 and 63-130 and 131-185 and a fourth (26-185). This set of peptides will help you connect binding with the functional effects of the truncations tested in Fig 3.

      Please see the response to the reviewing editor’s comment 2 and 5.

      Why didn't you test the C-terminus (from 719 till the end) of HCN4? This can help with understanding why truncation of HCN4 Cterminus does affect LRMP, tough partially (Fig. 4A).

      Please see the response to the reviewing editor’s comment 3.

      (4) "We found that a previously described HCN4-2 chimera containing the HCN4 N-terminus and transmembrane domains (residues 1-518) with the HCN2 C-terminus (442-863) (Liao et al., 2012) was partially regulated by LRMP (Fig. 7A and 7B)".

      I do not understand this partial LRMP effect on the HCN4-2 chimera. In Fig. 6 you have shown that the "HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP". How can be this reconciled with the HCN4-2 chimera? HCN4-2, "containing" P545A/T547F mutations, should not perceive LRMP.

      Please see the response to the reviewing editor’s comment 6.

      (5) "we next made a targeted chimera of HCN2 that contains the distal HCN4 N-terminus (residues 1-212) and the HCN2 transmembrane and C-terminal domains with 5 point mutants in non-conserved residues of the S5 segment and C-linker elbow (M338V/C341V/S345G/A467P/F469T)......Importantly, the HCN4-2 VVGPT channel is insensitive to cAMP in the presence of LRMP (Fig. 7C and 7D), indicating that the HCN4 Nterminus and cAMP-transduction centre residues are sufficient to confer LRMP regulation to HCN2".

      Why did you insert also the 3 mutations of S5? Are these mutations somehow involved in the cAMP transduction mechanism?

      You have already shown that in HCN4 only P545 and T547 (Clinker) are necessary for LRMP effect. I suggest to try, at least, the chimera of HCN2 with only A467P/F469T. They should work without the 3 mutations in S5.

      Please see the response to the reviewing editor’s comment 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Pan DY et al. discovered that the clearance of senescent osteoclasts can lead to a reduction in sensory nerve innervation. This reduction is achieved through the attenuation of Netrin-1 and NGF levels, as well as the regulation of H-type vessels, resulting in a decrease in pain-related behavior. The experiments are well-designed. The results are clearly presented, and the legends are also clear and informative. Their findings represent a potential treatment for spine pain utilizing senolytic drugs.

      Strengths:

      Rigorous data, well-designed experiments as well as significant innovation make this manuscript stand out.

      Weaknesses:

      Quantification of histology and detailed statistical analysis will further strengthen this manuscript.

      I have the following specific comments.

      (1) Since defining senescent cells solely based on one or two markers (SA-β-gal and p16) may not provide a robust characterization, it would be advisable to employ another wellestablished senescence marker, such as γ-H2AX or HMGB1, to corroborate the observed increase in senescent osteoclasts following LSI and aging.

      We value the comments provided by the reviewer. In accordance with your suggestion, we have performed co-staining of HMGB1 with Trap in Supplementary Figure 1 to corroborate the observed augmentation of senescent osteoclasts following LSI and aging.

      Author response image 1.

      (2) The connection between heightened Netrin-1 secretion by senescent osteoclasts following LSI or aging and its relevance to pain warrants thorough discussion within the manuscript to provide a comprehensive understanding of the entire narrative.

      We appreciate the reviewer's insightful comments. We have thoroughly addressed the entire narrative in the revised manuscript, as outlined below:

      During lumbar spine instability (LSI) or aging, endplates undergo ossification, leading to elevated osteoclast activity and increased porosity1-4. The progressive porous transformation of endplates, accompanied by a narrowed intervertebral disc (IVD) space, is a hallmark of spinal degeneration4,5. Considering that pain arises from nociceptors, it is plausible that low back pain (LBP) may be attributed to sensory innervation within endplates. Additionally, porous endplates exhibit higher nerve density compared to normal endplates or degenerative nucleus pulposus6. Netrin-1, a crucial axon guidance factor facilitating nerve protrusion, has been implicated in this process7-9. The receptor mediating Netrin-1-induced neuronal sprouting, deleted in colorectal cancer (DCC), was found to co-localize with CGRP+ sensory nerve fibers in endplates after LSI surgery10,11. In summary, during LSI or aging, osteoclastic lineage cells secrete Netrin-1, inducing extrusion and innervation of CGRP+ sensory nerve fibers within the spaces created by osteoclast resorption. This Netrin-1/DCC-mediated pain signal is subsequently transmitted to the dorsal root ganglion (DRG) or higher brain levels.

      (3) It appears that the quantitative data for TRAP staining in Figure 1j is missing.

      We appreciate the reviewer's comments. We have added the statistical data of TRAP staining (Figure. 1p) to Figure 1 in the revised manuscript.

      Author response image 2.

      (4) Regarding Figure 6, could you please specify which panels were analyzed using a t-test and which ones were subjected to ANOVA? Alternatively, were all the panels in Figure 6 analyzed using ANOVA?

      We appreciate the reviewer’s comments here. Upon careful review, we have ensured that quantitative data in panels b, c, and f are analyzed using t-tests, while panels d, e, and g are subjected to one-way ANOVA. These updates have been reflected in the revised figure legend.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examined the underlying mechanisms between senescent osteoclasts (SnOCs) and lumbar spine instability (LSI) or aging. They first showed that greater numbers of SnOCs are observed in mouse models of LSI or aging, and these SnOCs are associated with induced sensory nerve innervation, as well as the growth of H-type vessels, in the porous endplate. Then, the deletion of senescent cells by administration of the senolytic drug Navitoclax (ABT263) results in significantly less spinal hypersensitivity, spinal degeneration, porosity of the endplate, sensory nerve innervation, and H-type vessel growth in the endplate. Finally, they also found that there is greater SnOCmediated secretion of Netrin-1 and NGF, two well-established sensory nerve growth factors, compared to non-senescent OCs. The study is well conducted and data strongly support the idea. However, some minor issues need to be addressed.

      (1) In Figure 2C, "Number of SnCs/mm2", SnCs should be SnOCs.

      We apologize for the oversight. This has been rectified in the revised manuscript.

      Author response image 3.

      (2) In Figure 3A-E, is there any statistical difference between groups Young and Aged+PBS?

      We appreciate the reviewer's comments. Following your recommendation, we conducted additional statistical analyses to compare the young and PBS-treated aged mice, and we have incorporated these findings into the revised manuscript. The data reveals a significant increased paw withdrawal frequency (PWF) in aged mice treated with PBS compared with young mice, particularly at 0.4g instead of 0.07g (Figure 3a, 3b). Moreover, aged mice treated with PBS exhibited a significant reduction in both distance traveled and active time when compared to young mice (Figure. 3d, 3e). Additionally, PBS-treated aged mice demonstrated a significantly shortened heat response time relative to young mice (Figure. 3c).

      Author response image 4.

      (3) Again, is there any statistical difference between the Young and Aged+PBS groups in Figure 4F-K?

      We appreciate the reviewer's comments. As per your suggestion, we conducted a thorough analysis to determine the statistical differences between the young and aged+PBS groups, and these statistical results have been implemented in the revised manuscript. The caudal endplates of L4/5 in PBS-treated aged mice exhibited a significant increase in endplate porosity (Figure. 4f) and trabecular separation (Tb.Sp) (Figure. 4g) compared to young mice.

      Additionally, PBS-treated aged mice showed a significant elevation in endplate score (Figure. 4h), as well as an increased distribution of MMP13 and ColX within the endplates when compared to young mice (Figure. 4i, 4j). Furthermore, TRAP staining revealed a significant increase in TRAP+ osteoclasts within the endplates of PBS-treated aged mice as compared to young mice (Figure. 4k).

      Author response image 5.

      (4) What is the figure legend of Figure 7?

      The legend for Figure 7 (as below) is included in a separate PDF file labeled 'Figures and Legends.' We have carefully checked the revised manuscript and made sure all the legends are included.

      “Fig. 7. (a) Representative images of immunofluorescent analysis of CD31, an angiogenesis marker (green), Emcn, an endothelial cell marker (red) and nuclei (DAPI; blue) of adult sham, LSI and aged mice injected with PBS or ABT263. (b) Quantitative analysis of the intensity mean value of CD31 per mm2 in sham, LSI mice treated with PBS or ABT263. (c) Quantitative analysis of the intensity mean value of CD31 per mm2 in aged mice treated with PBS or ABT263. (d) Quantitative analysis of the intensity mean value of Emcn per mm2 in sham, LSI mice treated with PBS or ABT263. (e) Quantitative analysis of the intensity mean value of Emcn per mm2 in aged mice treated with PBS or ABT263. n ≥ 4 per group. Statistical significance was determined by one-way ANOVA, and all data are shown as means ± standard deviations. “

      (5) In "Mice" section, an Ethical code is suggested to be added.

      We appreciate the reviewer's comments. In accordance with your suggestion, we have included the Johns Hopkins University animal protocol number in the revised manuscript. The relevant paragraph has been updated to read: “All mice were maintained at the animal facility of The Johns Hopkins University School of Medicine (protocol number: MO21M276).”

      (6) In "Methods" section, please indicate the primers of GAPDH.

      We apologize for the absence of the GAPDH primers. Upon review, the GAPDH primers used were as follows: forward primer 5'-ATGTGTCCGTCGTGGATCTGA-3' and reverse primer 5'-ATGCCTGCTTCACCACCTTCTT-3'. These primer sequences have been included in the revised manuscript.

      (7) Preosteoclasts are regarded to be closely related to H-type vessel growth, so do the authors have any comments on this? Any difference or correlation between SnCs and preosteoclasts?

      The pre-osteoclast plays a crucial role in secreting anabolic growth factors that facilitate H-type vessel formation, osteoblast chemotaxis, proliferation, differentiation, and mineralization. The osteoclast represents the terminal differentiation phase, ultimately leading to the induction of resorption.

      Senescent cells, including senescent osteoclasts, are characterized by permanent cell cycle arrest and changes in their secretory profile, which can impact their function. In the context of osteoclasts, senescence can lead to a reduction in bone resorption capacity and impaired bone remodeling. Senescent osteoclasts are believed to contribute to age-related bone loss and bonerelated diseases, such as osteoporosis.

      Reviewer #3 (Public Review):

      Summary:

      This research article reports that a greater number of senescent osteoclasts (SnOCs), which produce Netrin-1 and NGF, are responsible for innervation in the LSI and aging animal models.

      Strengths:

      The research is based on previous findings in the authors' lab and the fact that the IVD structure was restored by treatment with ABT263. The logic is clear and clarifies the pathological role of SnOCs, suggesting the potential utilization of senolytic drugs for the treatment of LBP. Generally, the study is of good quality and the data is convincing.

      Weaknesses:

      There are some points that can be improved:

      (1) Since this work primarily focuses on ABT263, it resembles a pharmacological study for this drug. It is preferable to provide references for the ABT263 concentration and explain how the administration was determined.

      Thank you for your comment. ABT263 has been extensively employed in diverse research studies12-15. The concentration and administration of ABT263 followed the protocol outlined in the published paper13. The reference on how to use ABT263 is cited in the method section: “ABT263 was administered to mice by gavage at a dosage of 50 mg per kg body weight per day (mg/kg/d) for a total of 7 days per cycle, with two cycles conducted and a 2-week interval between them39”.

      (2) It would strengthen the study to include at least 6 mice per group for each experiment and analysis, which would provide a more robust foundation.

      Thank you for your comment here. In response, we conducted a new set of experiments, augmenting the majority of the sample size to six, and updated the corresponding statistical data in the revised manuscript.

      (3) In Figure 4, either use "adult" or "young" consistently, but not both. Additionally, it's important to define "sham," "young," and "adult" explicitly in the methods section.

      Thank you for your comment. We have addressed the inconsistency in the labeling of Figure 4. Additionally, we have explicitly defined "sham," "young," and "adult" in the methods section as follows: The control group (sham group) for the LSI group refers to C57BL/6J mice that did not undergo LSI surgery, while the control group (young group) for the Aged group refers to 4-month-old C57BL/6J mice.

      Author response image 6.

      (4) Assess the protein expression of Netrin 1 and NGF.

      Thank you for your comment here. We employed ELISA to assess the protein expression of Netrin-1 and NGF in the L3 to L5 endplates. The data revealed that compared to the young sham mice, LSI was associated with significantly greater protein expression of Netrin1 and NGF, which was substantially attenuated by ABT263 treatment in LSI mice (Supplementary Fig. 2a, 2b)

      Author response image 7.

      Reference

      (1) Bian, Q. et al. Excessive Activation of TGFbeta by Spinal Instability Causes Vertebral Endplate Sclerosis. Sci Rep 6, 27093, doi:10.1038/srep27093 (2016).

      (2) Bian, Q. et al. Mechanosignaling activation of TGFbeta maintains intervertebral disc homeostasis. Bone Res 5, 17008, doi:10.1038/boneres.2017.8 (2017).

      (3) Papadakis, M., Sapkas, G., Papadopoulos, E. C. & Katonis, P. Pathophysiology and biomechanics of the aging spine. Open Orthop J 5, 335-342, doi:10.2174/1874325001105010335 (2011).

      (4) Rodriguez, A. G. et al. Morphology of the human vertebral endplate. J Orthop Res 30, 280-287, doi:10.1002/jor.21513 (2012).

      (5) Taher, F. et al. Lumbar degenerative disc disease: current and future concepts of diagnosis and management. Adv Orthop 2012, 970752, doi:10.1155/2012/970752 (2012).

      (6) Fields, A. J., Liebenberg, E. C. & Lotz, J. C. Innervation of pathologies in the lumbar vertebral end plate and intervertebral disc. Spine J 14, 513-521, doi:10.1016/j.spinee.2013.06.075 (2014).

      (7) Hand, R. A. & Kolodkin, A. L. Netrin-Mediated Axon Guidance to the CNS Midline Revisited. Neuron 94, 691-693, doi:10.1016/j.neuron.2017.05.012 (2017).

      (8) Moore, S. W., Zhang, X., Lynch, C. D. & Sheetz, M. P. Netrin-1 attracts axons through FAK-dependent mechanotransduction. J Neurosci 32, 11574-11585, doi:10.1523/JNEUROSCI.0999-12.2012 (2012).

      (9) Serafini, T. et al. Netrin-1 is required for commissural axon guidance in the developing vertebrate nervous system. Cell 87, 1001-1014, doi:10.1016/s0092-8674(00)81795-x (1996).

      (10) Forcet, C. et al. Netrin-1-mediated axon outgrowth requires deleted in colorectal cancer-dependent MAPK activation. Nature 417, 443-447, doi:10.1038/nature748 (2002).

      (11) Shu, T., Valentino, K. M., Seaman, C., Cooper, H. M. & Richards, L. J. Expression of the netrin-1 receptor, deleted in colorectal cancer (DCC), is largely confined to projecting neurons in the developing forebrain. J Comp Neurol 416, 201-212, doi:10.1002/(sici)1096-9861(20000110)416:2<201::aid-cne6>3.0.co;2-z (2000).

      (12) Born, E. et al. Eliminating Senescent Cells Can Promote Pulmonary Hypertension Development and Progression. Circulation 147, 650-666, doi:10.1161/CIRCULATIONAHA.122.058794 (2023).

      (13) Chang, J. et al. Clearance of senescent cells by ABT263 rejuvenates aged hematopoietic stem cells in mice. Nat Med 22, 78-83, doi:10.1038/nm.4010 (2016).

      (14) Lim, S. et al. Local Delivery of Senolytic Drug Inhibits Intervertebral Disc Degeneration and Restores Intervertebral Disc Structure. Adv Healthc Mater 11, e2101483, doi:10.1002/adhm.202101483 (2022).

      (15) Yang, H. et al. Navitoclax (ABT263) reduces inflammation and promotes chondrogenic phenotype by clearing senescent osteoarthritic chondrocytes in osteoarthritis. Aging (Albany NY) 12, 12750-12770, doi:10.18632/aging.103177 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      The authors should include experiments such as Cryo-EM and genetically modified animals to demonstrate the physiological importance of the TMEM81 complex.

      While we intend to pursue cryo-EM studies of the putative complex (or subcomplexes thereof), this is clearly not a straightforward endeavor and goes beyond the scope of the present manuscript. Concerning the generation of genetically modified animals, we would like to underline that the majority of the proteins that we used for AlphaFold-Multimer complex predictions were precisely chosen based on the fact that - as detailed in the publications referenced in the Introduction - ablation of the respective genes caused sex-specific infertility due to defects in gamete fusion (the other criterion used for inclusion being structural similarity to IZUMO1 coupled with expression in the testis (IZUMO2-4 and TMEM81), or evidence from other kinds of experiments in the case of human-specific MAIA). Concerning TMEM81, experimental evidence for a direct involvement in gamete fusion is described in the referenced preprint by Daneke et al., which was submitted to bioRxiv concomitantly with the present work.

      Reviewer #2

      I believe that the manuscript would benefit from the authors providing more information about the systematic search (Figure 4). For example, by indicating for each pair tested the average pDock score in a 2D plot (or table) and as raw data in the supplementary information.

      Figure 4 has been modified to report both the top and the mean ranking scores for every interaction. Furthermore, additional metrics for the systematic search summarized in Figure 4, including pDockQ scores, are provided in this manuscript revision as supplementary Table S1.

      A global search, such as including all membrane proteins expressed in eggs or sperm, could not only be more informative but could also allow the reader to understand the pDock score discrimination power for this particular subset.

      The possibility of carrying out a global search was evaluated by performing preliminary computational experiments on an extended ensemble of sperm and egg proteins. In order to do so, we compiled a list of sperm membrane proteins by referring to 4 proteomic datasets (PMIDs 36384108, 36896575, 31824947, 24082039) and identifying ~600 proteins that were found in at least two of them; among these, 250 were single-pass type I or type II membrane proteins, or GPI-anchored proteins. Similarly, a list of 160 egg surface membrane proteins, excluding multipass and secreted ones, was obtained by comparing oocyte cDNA library NIH_MGC_257_N (Express Genomics, USA) with 4 proteomic datasets (PMIDs 35809850, 36042231, 29025019, 27215607). As we briefly commented at the beginning of the section “Prediction of interactions between human proteins associated with gamete fusion” of the revised manuscript, the tests carried out using the resulting list of sperm and egg proteins suggested that interpreting the results of a global search would be severely complicated by a relatively large number of putative false positives. Moreover, the tests showed that performing a complete systematic search would be beyond our current access to computing power. Based on these observations, we preferred to maintain the present study limited to proteins that had been previously clearly implicated in gamete fusion and/or matched specific structural features of IZUMO1.

      Figure 5 could be improved in clarity by schematically indicating to which cell each protein is anchored.

      This has been done in the revised version of the manuscript.

      Reviewer #3

      Major comments

      (1) In Figure 1, how the protein of mouse/human IZUMO1 and JUNO is purified is not mentioned in the main text nor in the Methods. Are the mouse IZUMO1-His and mouse JUNO-His transfected together or separately? Are human JUNO-His and human IZUMO1-Myc transfected together into HEK293 cells? And purified by IMAC?

      Transfection information has been included in the Methods section “Protein expression, purification and analysis” (previously “Protein expression and purification”). Concerning the purification procedure, we had already stated in the legend of Figure 1 that human JUNOE-His/IZUMO1E-Myc had been purified by IMAC before SEC, and have now done the same for mouse JUNOE-His and IZUMO1E-His.

      (2) It would be easier to understand the figure if the author could run a WB to indicate which band above JUNO is specifically IZUMO1-Myc in Figure 1.

      This has been done and reported in a new Figure S1 (with the original Figure S1 having now become Figure S2). Details about the antibodies used for immunoblot have been included in both Methods section “Protein expression, purification and analysis” and the Key Resources Table.

      (3) Figure 4: Analysis of more proteins that have been suggested as possible candidates for sperm-egg interaction will help to highlight the following results. Also, providing a score for the possibility of interaction might help in selecting those proteins in Figures 5 and 6.

      Please refer to the answer to the first question of Reviewer #2.

      (4) Figure 7: The authors take advantage of the latest developments in protein structure and interaction to model protein complex formation. However, some experimental experiments such as Co-IP, pull down to support the prediction to verify some of this predicated interaction is necessary.

      We agree with the reviewer; however, for the reasons we discussed during our comparison of the biochemical properties of the JUNO/IZUMO1 interaction between mouse and human, pursuing this line of inquiry will likely necessitate an extensive set of parallel experiments using proteins from different species. This work is being planned and will be the focus of future studies. However, as we mentioned at the end of the Abstract, one should also consider that some of these complexes are likely to be highly transient. Because of this, while they may have important regulated roles in vivo (function at a specific time and place), they could be very challenging to detect using standard approaches in vitro. We thus see this as a significant advance that structural modeling could contribute to the identification of such functionally important but transient interactions.

      Minor points

      (1) In the abstract, "three sperm (IZUMO1, SPACA6 and TMEM81) "should be "three sperm proteins."

      The Abstract has been condensed to fit within the suggested 200-word limit and, as part of this, the sentence has been changed to “complex involving sperm IZUMO1, SPACA6, TMEM81 and egg JUNO, CD9”.

      (2) How do the predictions of the binary complex IZUMO1/CD9 (Figure S1B) or IZUMO1/CD81 (Figure S1C) suggest "the two egg tetraspanins are interchangeable"? Was it because they are quite similar? Please provide more explanation for this speculation. Interchangeable by function or for complex formation? To support the conclusion, biochemical data is required. Otherwise, it needs to be toned down.

      This is because, in the AlphaFold-Multimer predictions of the pentameric complex, CD9 and CD81 are placed in essentially the same way relative to the other subunits.

      We have now clarified this at the end of page 6:

      “(...) suggest that the two egg tetraspanins are interchangeable because they are predicted to bind to the same region of IZUMO1; (...)”

      (3) It would be more reader-friendly if the author could label the name of each protein in the figure in Figure S1, especially when the name is not written in the figure legend.

      This has been done in Figure S2 of the revised manuscript (corresponding to original Figure S1).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study examined a universal fractal primate brain shape. However, the paper does not seem well structured and is not well written. It is not clear what the purpose of the paper is. And there is a lack of explanation for why the proposed analysis is necessary. As a result, it is challenging to clearly understand what novelty in the paper is and what the main findings are.

      We have now restructured the paper, including a summary of the main purpose and findings as follows:

      “Compared to previous literature, we can summarise our main contribution and advance as follows:

      (i) We are showing for the first time that representative primate species follow the exact same fractal scaling – as opposed to previous work showing that they have a similar fractal dimension [Hofman1985, Hofman1991], i.e. slope, but not necessarily the same offset, as previous methods had no consistent way of comparing offsets.

      (ii) Previous work could also not show direct agreement in morphometrics between the coarse-grained brains of primate species and other non-primate mammalian species.

      (iii) Demonstrating in proof-of-principle that multiscale morphometrics, in practice, can have much larger effect sizes for classification applications. This moves beyond our previous work where we only showed the scaling law across [Mota2015] and within species [Wang2016], but all on one (native) scale with comparable effect sizes for classification applications [Wang2021].

      In simple terms: we know that objects can have the same fractal dimension, but differ greatly in a range of other shape properties. However, we demonstrate here, that representative primate brains and mammalian brain indeed share a range of other key shape properties, on top of agreeing in fractal dimension. This suggests a universal blueprint for mammalian brain shape and a common set of mechanisms governing cortical folding. As a practical additional outcome of our study, we could show that our novel method of deriving multiscale metrics of brain shape can differentiate subtle shape changes much better than the metrics we have been using so far at a single native scale.”

      We plan to use the second paragraph as a plain-language summary of our work.

      Additionally, several terms are introduced without adequate explanation and contextualization, further complicating comprehension.

      We have now made sure that potential jargon is introduced with context and explanation. For example in Introduction: “This scaling law, relating powers of cortical thickness and surface area metrics, […]”

      Does the second section, "2. Coarse-graining procedure", serve as an introduction or a method?

      We have now renamed this section to “Coarse-graining Method” to indicate that this is a section about methods. However, to describe the methods adequately, we also expanded this section with introductory texts around the history and motivation of the method to provide context and explanations, as the reviewer rightly requested.

      Moreover, the rationale behind the use of the coarse-graining procedure is not adequately elucidated. Overall, it is strongly recommended that the paper undergoes significant improvements in terms of its structure, explanatory depth, and overall clarity to enhance its comprehensibility.

      To specifically explain the rationale behind the coarse-graining method, we added several clarifications, including the following paragraph:

      “As a starting point for such a coarse-graining procedure, we suggest to turn to a well-established method that measures fractal dimension of objects: the so-called box-counting algorithm [Kochunov2007, Madan2019]. Briefly, this algorithm fills the object of interest (say the cortex in our case) with boxes, or voxels of increasingly larger sizes and counts the number of boxes in the object as a function of box size. As the box size increases, the number of boxes decreases; and in a log-log plot, the slope of this relationship indicates the fractal dimension of the object. In our case, this method would not only provide us with the fractal dimension of the cortex, but, with increasing box size, the filled cortex would also contain less and less detail of the folded shape of the cortex. Intuitively, with increasing box size, the smaller details, below the resolution of a single box, would disappear first, and increasingly larger details will follow -- precisely what we require from a coarse-graining method. We therefore propose to expand the traditional box-counting method beyond its use to measure fractal dimension, but to also analyse the reconstructed cortices as different realisations of the original cortex at the specified spatial scale.”

      Reviewer #2 (Public Review):

      In this manuscript, Wang and colleagues analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The work builds on the scaling law introduced previously by co-author Mota, and Herculano-Houzel. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between the brains of young (~20 year old) and old (~80) humans. My main criticism of this manuscript is that the findings are presented in too abstract a manner for the scientific contribution to be recognized.

      We recognise that our work is at the intersection of complex mathematical concepts and a perplexing biological phenomenon. Therefore, our paper has to strike a balance between scientifically accurate and succinct descriptions whilst giving sufficient space to provide context and explanations.

      Throughout, we have now added text to provide more context, but also repeat key statements in plain-english terms.

      For example, we added the following text to highlight our key contributions.

      “In simple terms: we know that objects can have the same fractal dimension, but differ greatly in a range of other shape properties. However, we demonstrate here, that representative primate brains and mammalian brain indeed share a range of other key shape properties, on top of agreeing in fractal dimension. This suggests a universal blueprint for mammalian brain shape and a common set of mechanisms governing cortical folding. As a practical additional outcome of our study, we could show that our novel method of deriving multiscale metrics of brain shape can differentiate subtle shape changes much better than the metrics we have been using so far at a single native scale.”

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1, constitute a novel procedure, but it is not strongly motivated, and it produces image segmentations that do not resemble real brains.

      To specifically explain the rationale behind the coarse-graining method, we added several clarifications, including the following paragraph:

      “As a starting point for such a coarse-graining procedure, we suggest to turn to a well-established method that measures fractal dimension of objects: the so-called box-counting algorithm [Kochunov2007, Madan2019]. Briefly, this algorithm fills the object of interest (say the cortex in our case) with boxes, or voxels of increasingly larger sizes and counts the number of boxes in the object as a function of box size. As the box size increases, the number of boxes decreases; and in a log-log plot, the slope of this relationship indicates the fractal dimension of the object. In our case, this method would not only provide us with the fractal dimension of the cortex, but, with increasing box size, the filled cortex would also contain less and less detail of the folded shape of the cortex. Intuitively, with increasing box size, the smaller details, below the resolution of a single box, would disappear first, and increasingly larger details will follow -- precisely what we require from a coarse-graining method. We therefore propose to expand the traditional box-counting method beyond its use to measure fractal dimension, but to also analyse the reconstructed cortices as different realisations of the original cortex at the specified spatial scale.”

      We also note in several places in the text that the coarse-grained brains are not to be understood as exact reconstructions of actual brains, but serve the purpose of a model:

      “[…] nor are the coarse-grained versions of human brains supposed to exactly resemble the location/pattern/features of gyri and sulci of other primates. The similarity we highlighted here are on the level of summary metrics, and our goal was to highlight the universality in such metrics to point towards highly conserved quantities and mechanisms.”

      “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel (section S2). This causes the cortical segmentation, such as the bottom row of Figure 1B, to increase in thickness with successive melting steps, to unrealistic values. For the rightmost figure panel, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of a typical mammalian neocortex.

      Specifically on the point on increasing cortical thickness with increased level of coarse-graining, we have now added the following paragraph:

      “The observation that with increasing voxel sizes, the coarse-grained cortices tend to be smoother and thicker is particularly interesting: the scaling law in Eq. 1 can be understood as thicker cortices (T) form larger folds (or are smoother i.e. less surface area At) when brain size is kept constant (Ae). This way of understanding has also been vividly illustrated by using the analogy of forming paper balls with papers of varying thickness in [Mota2015]: to achieve the same size of a paper ball (Ae), the one that uses thicker paper (T) will show larger folds (or is smoother i.e. less surface area At) than the one using thinner paper. The scaling law can therefore be understood as a physically and biologically plausible statement, and here, we are encouraged that our algorithm yields results in line with the scaling law.”

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4c. It seems this additional spacing between gyri in 80-year-olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20-year-olds. A case could be made that the familiar way of describing brain tissue - cortical volume, white matter volume, thickness, etc. - is a more direct and intuitive way to describe differences between young and old adult brains than the obscure shape metric described in this manuscript. At a minimum, a demonstration of an advantage of the Figure 4a and 4b analyses over current methods for interpreting age-related differences would be valuable.

      We have demonstrated the utility of our new shape metrics in a separate paper [Wang2021]. However, we agree with the reviewer that, in this specific instance, it is much easier to understand the key message without considering the less traditional metrics. We have therefore completely revised this part of the Results section to highlight the advantage of multiscale morphometrics, and used the traditional metric of surface area to illustrate the point. The reasoning in surface area is much easier to follow, both visually and conceptually, exactly as the reviewer described.

      (3) In Discussion lines 199-203, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. First, the authors do not show, (and it would be surprising if it were true) that self-similarity is observed for length scales smaller than the acquired MRI data for any of the datasets analyzed. The analysis is restricted to coarse (but not fine)-graining.

      To clarify this point, we have added a supplementary section and the following sentence: “Note this method has also no direct dependency on the original MR image resolution, as the inputs are smooth grey and white matter surface meshes reconstructed from the images using strong (bio-)physical assumptions and therefore containing more fine-grained spatial information than the raw images (also see Suppl. Text 3).”

      We are indeed sampling at resolutions down to 0.2mm, which is below MR image resolution. The reviewer is, however, correct that we are only coarse-graining, not “fine-graining”. Coarse-graining, here, relates to more coarse than the smooth surface meshes though, not the MR image.

      Therefore, self-similarity on all length scales would seem to be too strong a constraint. Second, it is hard to imagine how this test could be used in practice. Specific examples of how gyrification mechanisms support or fail to support the generation of self-similarity across any length scale, would strengthen the authors' argument.

      We agree that spatial scales much below 0.2mm resolution may not be of interest, as these scales are only measuring the fractal properties, or “bumpiness”, of the surface meshes at the vertex level. We have therefore revised our statement in Discussion and clarified it with an example: “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface could be directly coarse-grained with our method and compared to actual human and primate data provided here.”

      Some additional, specific comments are as follows:

      (4) The definition of the term A_e as the "exposed surface" was difficult to follow at first. It might be helpful to state that this parameter is operationally defined as the convex hull surface area.

      We agree and introduced this term now at first use: “The exposed surface area can be thought of as the surface area of a piece of cling film wrapped around the brain. Mathematically, for the remaining paper it is the convex hull of the brain surface.”

      Also, for the pial surface, A_t, there are several who advocate instead for the analysis of a cortical mid-thickness surface area, as the pial surface area is subject to bias depending on the gyrification index and the shape of the gyri. It would be helpful to understand if the same results are obtained from mid-thickness surfaces.

      This point is indeed being investigated independently of this study. Our provisional understanding is that in healthy human brains, at native scale, using the mid (or the white matter) surface introduced a systematic offset shift in the scaling law, but does not affect the scaling slope of 1.25. However, this requires a more in-depth investigation in a range of other conditions, and in the context of the coarse-grained shapes, which is on-going. Nevertheless, the scaling law, at first introduction already, has been using the pial surface area [Mota2015] and all subsequent follow-up studies followed this convention. To make our paper here accessible and directly comparable, we therefore used the same metric. Future work will investigate the utility of other metrics.

      (5) In Figure 2c, the surfaces get smaller as the coarse-graining increases, making it impossible to visually assess the effects of coarse-graining on the shapes. Why aren't all cortical models shown at the same scale?

      The purpose of rescaling the surfaces is to match the scaling plot (Fig 2A) directly, which are showing shrinking surface areas Ae and At with increasing coarse-graining. Here, we are effectively keeping the size of the box constant and resizing the cortical surface instead, which is mathematically equivalent to changing the box size and keeping the cortical surface constant.

      An alternative interpretation of the “shrinking” is, therefore, that with increasingly smaller cortical surfaces, the folding details disappear, as we require from our coarse-graining method. This is also visually apparent, as the reviewer points out. We have added this to the explanation in the text.

      If we, however, changed the box size instead, the scaling law plot would be meaningless: for example, Ae would barely change with coarse-graining. We would therefore have needed to introduce more complexity in our analysis in terms of how we can measure the scaling law. Thus, we opted to present the simpler method and interpretation here.

      Nevertheless, we agree that a direct comparison would be beneficial and have thus added the videos for each species in supplementary under this link: https://bit.ly/3CDoqZQ Upon completed peer-review, we hope to integrate these directly into eLife’s interactive displays for this figure.

      (6) Text in Section 3.2 emphasizes that K is invariant with scale (horizontal lines in Figure 3), and asserts this is important for the formation of all cortices. However, I might be mistaken, but it appears that K varies with scale in Figure 4a, and the text indicates that differences in the S dependence are of importance for distinguishing young vs. old brains. Is this an inconsistency?

      We agree that it may be confusing to emphasise a “constant K” in the first set of results across species, and then later highlight a changing K in the human ageing results. To clarify, in the first set of results, we find a constant K relative to a changing S: the range in K across melted primate brains is less than 0.1, whereas in S it is over 1.2. In other words, S changes are an order of magnitude higher than K changes. Hence, we described K as “constant” relative to S.

      Nevertheless, K shows subtle changes within individuals, which is what we were describing in the human ageing results. These changes are within the range of K values described in the across species results.

      However, in the interest of clarity, we followed the reviewer’s suggestion of simplifying the last set of results on human ageing and therefore the variable K in human ageing now only appears in Supplementary. We have now added clarifications to the supplementary on this point.

      Reviewer #3 (Public Review):

      Summary:

      Through a detailed methodology, the authors demonstrated that within 11 different primates, the shape of the brain matched a fractal of dimension 2.5. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependent effect of aging on the human brain.

      Strengths:

      • New hierarchical way of expressing cortical shapes at different scales derived from the previous report through the implementation of a coarse-graining procedure.

      Positioning of results in comparison to previous works reinforcing the validity of the observation.

      • Illustration of scale-dependence of effects of brain aging in the human.

      Weaknesses:

      • The impact of the contribution should be clarified compared to previous studies (implementation of new coarse graining procedure, dimensionality of primate brain vs previous studies, and brain aging observations).

      We have now made these changes, particularly by adding two paragraphs to the start of Discussion. One summarising the main contributions above previous work, and one paraphrasing the former in plain English for accessibility.

      • The rather small sample sizes, counterbalanced by the strength of the effect demonstrated.

      We have now increased the sample size of the human ageing analysis substantially to over 100 subjects and observe the same trends, but with an even stronger effect. We therefore believe that this revision serves as an additional internal validation of our data and methods.

      • The use of either averaged or individual brains for the different sub-studies could be made clearer.

      We have now added this to our Suppl methods: with the exception of the Marmoset, all brain surface data were derived from healthy individual brains.

      • The model discussed hypothetically in the discussion is not very clear, and may not be state-of-the-art (axonal tension driving cortical folding? cf. https://doi.org/10.1115/1.4001683).

      We have now added this citation to our Discussion and given it context:

      “Indeed, our previously proposed model [Mota2015] for cortical gyrification is very simple, assuming only a self-avoiding cortex of finite thickness experiencing pressures (e.g. exerted by white matter pulling, or by CSF pressure). The offset K, or 'tension term', precisely relates to these pressures, leading us to speculate that subtle changes in K correlate with changes in white matter property [Wang2016, Wang2021]. In the same vein of speculation, the scale-dependence of K shown in this work might therefore be related to different types of white matter that span different length scales, such as superficial vs. deep white matter, or U-fibres vs. major tracts. However, there are also challenges to the axonal tension hypothesis [Xu2010]. Indeed, white matter tension differentials in the developed brain may not explain location of folds, but instead white matter tension may contribute to a whole-brain scale 'pressure' during development that drives the folding process overall.”

      Reviewer #3 (Recommendations For The Authors):

      Many thanks to the authors for this elegant article. I will only report here on the cosmetics of the article.

      We thank the reviewer for their kind words and attention to detail and have made all the suggested changes and revised the paper generally for readability, grammar and spelling.

      p2: last line of abstract: 'for a range of conditions in the future'.

      p3 l.37: I would not self-describe this method as elegant as this is a subjective property .

      p3 l.38: 'that will render' -> I wouldn't use the future here.

      p.4 l.59: double spacing before ref [9]?

      p.6 l.99: 'approximate a fractal' -> why is 'a' italicized?

      p.7 fig.2: I would expect the colours to be detailed in the legend. Are there two data points per species because both hemispheres are treated separately?

      p.9 l.134-135: 'similar to and in terms of the universal law 'as valid as' -> please add commas for reading comfort: 'similar to, and, in terms of the universal law, 'as valid as'.

      p.9 l. 141: For all the cortices we analysed.

      p.9 Fig 3: I find the colours a bit confusing in Figs B and C. I find Fig C a bit confusing: what are all the lines representative of, and more specifically, the two lower lines with a different trajectory?

      p.10 l.155: '1̃500' -> '~1500'.

      p.13 l. 209: either 'speculate that' of 'wonder if'.

      p.14 l.232: 'neuron numbers' -> 'number of neurons'.

      p.26 S2 second paragraph: 'gryi' -> 'gyri'.

      p.30 l.3: please refrain from starting a sentence with I.e..

      p.30 last line before S3.2: 'The algorithmic implementation in MATLAB can be found on Zenodo: TBA' - I guess this is linked to you disclosing the code upon acceptance, but please complete before final submission.

      p.34 middle/bottom of page: 'The scheme described in Sec. S3.1' -> double spacing before S3.1?

      p.35 l.1: 'We simply replace' -> 'we simply replace' (no capital).

      p.36 Fig S5.1: explicit the same colouring of the points and boxes in legend

      p.38 Fig. S6.1: briefly describe the use of colours in the legend.

      p.39 Fig. S7.1: detail colours in the legend.

      p.41 Fig. S7.3: detail colours in the legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Figure 1

      • The "matched primary tumors" from TCGA include n=424 from cutaneous melanoma; but it is unclear where this is coming from; the PanCan Atlas for melanoma shows n=81 primary and 367 metastatic tumors. There are also additional large cohorts of ICI-treated metastatic tumors with RNAseq data (e.g. a metastatic melanoma cohort with 100+ patients https://doi.org/10.1038/s41591-019-0654-5) that would increase the numbers here.

      We thank the reviewer for their observation. We have replaced references to “primary” cancers as “TCGA” cancers as appropriate. While the TCGA analyses included metastatic samples, the majority of the TCGA tumors in most cohorts correspond to primary cancers or local metastases, a point which we added to the text. We retained Fig. 1D as the representative examples are actual primary samples. We have decided to defer analysis of additional melanoma cohorts for future inquiry.

      Figure 2

      • What is the basis for the split between high and low Dux4 expressing tumors at 1 TPM? Is it arbitrary, or based on some structure in the distribution? (e.g. bimodal distribution)

      Our previous analyses of RNA-seq datasets derived from early embryogenesis samples (PMID: 3132774, 28459457) showed that physiologic levels of DUX4 range from approximately 2 to 10 TPM. We added a description in the methods section, under “Genome annotations, gene expression, and Gene Ontology (GO) enrichment analyses,” of our conservative choice for the threshold: DUX4-positivity defined as expression levels > 1 TPM.

      Figure 3

      • Overall claim is that Dux4 expression is associated with worse survival in metastatic urothelial carcinomas treated with PD-L1 inhibitor. However, the rationale for the choice of split (Dux4 expression < 0.5 and > 1 TPM) to show is unclear (is this the 25th percentile? 75th percentiles?), and the rationale/interpretation of the "partial adjustment" for TMB by removing the bottom quartile of TMB feels non-rigorous and prone to bias. It doesn't feel like Fig 3bc contributes very much; Figure 4 really is the more rigorous analysis.

      We thank the reviewer for these comments and suggestions. We adjusted the analyses in Fig. 3C and Fig. S3 to be consistent with Fig. 1 and Fig. 2, in terms of the choice of split. We also clarified in the text how our initial, crude TMB adjustment served as an important indication for us to pursue more rigorous statistical approaches.

      Figure 4

      • Dux4 expression is independently associated with worse survival considering other clinical and molecular characteristics

      • I would include TGFB in the features considered in the table (in the supplementary but not the main table or forest plots, not sure why not?)

      • The choice of Dux4 expression split ( < 0.25 and > 1 TPM) feels arbitrary and is different than the split in Figure 3; what is the rationale for this? Also, how many patients does this exclude? (TPM between 0.25 and 1). What does the continuous value or median split for Dux4 expression give you for the CoxPH model?

      • Re: building a predictive model, excluding patients (e.g. between <0.25 and > 1 Dux4 TPM) makes the model difficult to apply (e.g. cannot apply to patients with Dux4 levels in the missing interval); a better predictive model would include all patients in the cohort.

      We thank the reviewer for their other suggestions. We have clarified in the text that our choice to define DUX4negative samples as those with DUX4 expression levels < 0.25 TPM was made to preemptively address potential misclassifications due to decreased sensitivity of bulk RNA-seq at very low expression levels (PMID: 18516045). We believe our classifications with the new scheme are more reliable. We have also now specified in the text that our categorization excludes 126 patients. We have decided to not pursue the addition of TGFB or exploration of the use of an alternative split or continuous version of DUX4 expression in the Cox Proportional Hazards analyses but appreciate the suggestions, which we will keep in mind for future studies.

      Figure 5

      • An RSF (randomized survival forest) model predicts survival in Dux4+ vs Dux4- patient, and the Shapley values for landmark time analyses show time-varying effects of different features.

      • In some sense, the authors have already demonstrated that Dux4+ is associated with survival differences in ICI treated patients; so a model that predicts survival applied to Dux4+ and Dux4- patients that shows a difference in survival is unsurprising (even in a training/test set setting given that there is a difference in survival across the entire cohort). The quantified marginal effect (from a predictive perspective) of different features is what is interesting here. In that light, I'd like to see more validation of the model up front, specifically how close the predicted survival is to the actual survival of patients (e.g. the survival curves in Fig 5a but with actual survival of the Dux4- and Dux4+ cohorts superimposed on the predicted probabilities).

      We thank the reviewer for this suggestion. We have added a plot showing the superimposed survival probability estimates over time for the RSF and KM models for patients assigned to either the test or training sets in Fig. 5.

      SFig 5

      • Unclear how the authors got estimates of the # of expected deaths associated with covariates (e.g. "...we measured an increase in the number of predicted deaths associated with DUX4-positivity by approximately 16, over DUX4negative status (Fig S5F-G).") from Shapley values as shown in the indicated figure - is this 16 out of the entire cohort? At a given time point? Would recommend perhaps showing the inferred absolute change in mortality (e.g. 8% absolute increase in mortality)

      Mortality is the expected number of deaths for the cohort over the observation window, measured as the sum of the CHF over time. We have clarified this in the Methods section, under “Random Survival Forest, feature importance, and partial dependence.” We have also changed the quantification to show the absolute mortality differences comparing patients with DUX4-negative and -positive tumors; we thank the reviewer for this suggestion. We have also clarified in the text that adjusted mortality was estimated via partial dependence, which operates using the correct units, as opposed to Shapley values, where attribution is scaled. Finally, we changed the referenced figure when discussing changes in mortality associated with TMB and DUX4 status (Fig. S5H-I); we appreciate the reviewer pointing out this error.

      Figure S1B-C

      • The authors argue that Dux4 expression is not an artifact of FFPE tissue by analyzing a mixed tumor cohort sequenced with both poly-A and hybrid probe capture in matched flash-frozen and FFPE tumor samples, showing that it is 1) detectible both FFPE and flash-frozen tissue and 2) higher levels are detected in polyA sequencing/frozen tissue. However, the reference for this section (D. Robinson et al 2015) is a study of a cohort of prostate cancers with polyA bulk RNAseq sequencing; is this correct/is the data coming from a different study?

      • Analysis of scRNAseq (if available) would strengthen their analyses by better delineating the expression and response of interferon-gamma and downstream (e.g. antigen presentation) pathways in specific cell compartments, and potential differences in cell-cell interactions (e.g. using CellPhoneDB) associated with Dux4+ vs Dux4- tumors.

      • Do the investigators find similar findings in primary and metastatic tumors sequenced the same way (e.g. tcga primary vs met melanoma, albeit most of the met melanoma are Stage III lymph nodes)?

      We thank the reviewer for finding the citation error. We have corrected the manuscript to reflect the correct study we analyzed (PMID: 28783718). We also thank the reviewer for their additional suggestions, which undoubtedly would strengthen the current study. However, we have respectfully decided to defer these additional analyses for future study.

      Reviewer #2:

      It is strange as a statistician to see BIC and AIC represented as barplots, e.g. Figure 4B. There is no knowledge to be gained through this visual representation that would not otherwise be conveyed by just giving the numbers.

      We thank the reviewer for this suggestion. We understand that simply stating the numbers would be equally informative. However, we respectfully decided to retain our current versions of Figures 4 and S4 so that the numbers can be illustrated in a visual manner in the figures, rather than just stated in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Line 144, after eq. (1). Vectors d_i need to be defined. Are these the mapping of vectors e_i due to the active deformation? It would be useful to state then that d_3 is aligned with r'.

      Thank you for your suggestion, and the definition has been added to lines 146-149 for a better understanding of the model.

      • Line 144.Authors state a_i(0,0,Z)=0. Shouldn't this be true also for any angle, i.e., a_i(0,Theta, Z)=0?

      Thank you, we have revised it in line 144.

      • Line 156. G_0 is defined as Diag(1,g_0(t), 1), which seems to be using cylindrical coordinates. Previously, in line 147, vector argument X of \chi is defined with Cartesian coordinates (X,Y,Z). Shouldn't these be also cylindrical?

      We are very sorry for this error, our initial configuration is defined with cylindrical coordinates, we have revised it in the manuscript line 151.

      • Line 162. "where alpha and beta lie in the range [-pi/2, pi/2]" has already been indicated.

      Thank you for your mention, we have deleted duplicate information in line 166.

      • Line 171. W is defined as the strain energy density, while in equation (2), symbol W is the total energy (which depends on the previous W). Letters for total elastic and strain energy must be distinguished.

      Thank you, we have changed the letter for total energy in Eq.(2).

      • Line 176. "we take advantage of the weakness of" -> "we take advantage of the small value of".

      We have revised it in line 179.

      • Line 177. Why is there a subscript i in p_i? If these do not correspond to penalty p, but to parameters in eqn (3), the latter should have been introduced before this line.

      We have revised this error in line 180.

      • Line 186. "as the overall elongation \zeta". This parameter, axial extension, has not been defined yet.

      Thank you for your mention, the definition of \zeta is now given in line 146.

      • Figure 4. Why are the values of g_0 from the elastic model and equations (30)-(32) so non-smooth? Clarify what is being fit and what is the input in the latter equations. Final external radius R_3? Final internal radius R_1'?

      (1) To mimic the embryo, we consider a multi-layered cylindrical body so that the shear modulus of each layer is different. The continuity of both deformations and stresses is imposed (see Eq.(26)-Eq.(30). This is the usual treatment for complex morpho-elastic systems. Obviously, $g_0$ originates from the actomyosin cortex so it appears only in the corresponding layer. Finally, all physical quantities such as deformations and stresses must be continuous.

      (2) The final outer radius is R_3, which represents the outer radius of C. elegans embryos. In addition to R_3, what we need to consider in this model are R_1’=0.7, R_1’=0.768, R_2=0.8 and R_2’=0.96, these definitions have been added in the caption of Appendix 2—figure 1.

      • Line 663, equation (19). Parameter mu is multiplying penalisation term with p, while in equation (2) mu is only affecting the elastic part.

      These two different ways of expressing the energy function will ultimately affect the value of p, but the two p are not the same quantities, so they will not affect our results. To avoid misunderstandings, we will replace p in equation (19) with q.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my public summary, I find the writing really not adequate. I provide here a list of specific points that the authors should in my opinion address. As a general comment, I would delete many instances of 'the'.

      First, here are figures and whole paragraphs that do not seem to bring anything to the understanding of the phenomenon of C. elegans elongation, notably, Figs. 2, 3C-H, 5m, and 6. Figures 6G and 7 are the only figures containing results it seems. Some elements of the figures are repeated, for example, the illustration of the system's cross-section in Figs 3 and 5.

      Thank you for your suggestion, we have made some adjustments to our images to remove some of the duplicate information.

      Second, and this is my most important criticism: the mechanism of elongation by releasing elastic stress introduced by muscle contraction is not explained in clear terms anywhere in the text. At least, I was unable to understand it. On p 10 you write "This energy exchange causes the torsion-bending energy to convert into elongation energy, (...)" How this is done is not explained. I assume that the reference state is somehow changed through muscle contraction. The new reference state probably has a longer axis than the one before, but this would then be a plastic deformation and not purely elastic as claimed by the authors (ll 76: "This work aims to answer this paradox within the framework of finite elasticity without invoking cell plasticity (...)"). Is torsion important for this process or is it 'just' another way to store elastic energy in the system?

      We perfectly explain most of the exchange of energy between bending, torsion and elongation: indeed, we quantify all aspects of this transformation as the elastic elongation energy, and the dissipation processes which will cost energy. The dissipation evaluated here concerns the rotation of the worm due to the muscle geometry and the viscous friction at the inner surface of the egg. Torsion seems to appear in the late stages and only in some cases. As we show, it comes from a torque induced by the muscles which are not vertical. vertical. Finally, our quantitative predictions of the modelling which recovers most of the experimental published results.

      Third, there are a number of strange phrasings and the notation is not helpful in places.

      We feel sorry for that, the manuscript is now more precise.

      Fourth, the title promises to explain how cyclic muscle contractions reinforce acto-myosin motors. I can't see this done in this work.

      The fact that the acto-myosin is reorganized between two sequences of contraction justifies the title. The complete reorganization of the actomyosin network would require a chemico-mechanical model that is not achieved here, perhaps in future work as data become available.

      In addition:

      We have chosen to respond globally rather than point by point to the referee’s recommendations.

      Typographic errors and vocabulary

      All English corrections and typos are now included in the main text.

      Figures and captions:

      Figures and captions have been improved.

      • Figure 1: Make the caption and the illustration more coherent. For example, only two cell types are distinguished; in the caption, you mention lateral cells, in the sketch seam cells. What is the difference between acto-myosin and muscle contraction? Muscle contraction is also auto-myosin-based.

      (1) The caption for Fig.1 is revised.

      (2) From a mechanical point of view, actomyosin bundles in C elegans are orthoradial, whereas muscles are essentially parallel to the main axis of the body are essentially parallel to the main axis of the body, so the geometry is completely different and of extreme importance for deformation. Muscle contractions are quasi-periodic, we do not know the dynamics of the attached molecular motor of myosin. So of course, both contain actin and myosin (not exactly the same proteins), but our model is sensitive to more macroscopic properties.

      • Figure 2: I do not find this figure helpful. I might expect such a figure in a grant proposal, but much less in an article.

      Figure 2 shows the strategy of our work, we hope that readers can see at a glance what kind of analysis has been done through this figure: since our work is divided into several parts, readers can also unravel the logic through this scheme after reading the whole manuscript. So, this diagram is a guide, and it may be helpful and necessary.

      • Figure 3: Figure 3 A, right: What is the dashed line? B You indicate fibers, but your model does not contain fibers, does it? How do I get from the cube to the deformed object? What is the relation of C-H with the rest of the work? Furthermore, you mention seam cells in Fig. 1, but they are absent here. Why can you neglect them? Why introduce them in the first place? E What is a plant vine? F-H What rods are you referring to? Plants do not have muscles, right?

      We have modified this figure, and the original Figure 3 now corresponds to Figures 3 and 4.

      (1) The dashed line is the centerline after deformation.

      (2) The referee is wrong: our model represents the fibers by a higher shear modulus for the actomyosin cortex and for the muscles (see Table Appendix 1) and G_1 reflects the activities of the muscle and actin fibers.

      (3) The cube in Figure 3 is a mathematical 3D volume element that is subjected to stresses. Hyperelasticity modelling is based on such a representation.

      (4) C-H(new version: Fig.4 A-F): These images show similar deformations: bending and torsion as our C. elegans study. These figures indicate that such deformations are quite common in nature, even if the underlying mechanism is different.

      (5) This is a point we have already mentioned: we ignore the difference between the different types of epidermal cells and average their role in the early and second stages of elongation.

      (6) The plant vine is the 'botanical vine', see Goriely's article and book.

      (7) F-H(new version: Fig.4 D-F) do not have fixed rods, we set a curvature and torsion to fit the actual biological behavior.

      (8) Plants do not have muscles, but they grow, and our formalism for growth, pre-strain and material plasticity is very similar to the hyper-elasticity formalism.

      • Figure 4: Fig .4 A: "The central or inner part (0 < 𝑅 < 𝑅2, shear modulus 𝜇𝑖) except the muscles which are stiffer." I do not understand.

      In the new version, this figure corresponds to Fig.5. The shear modulus of the intrinsic part is very small, but the muscles are harder so we have to consider them separately, we have revised this sentence to avoid misunderstanding.

      • Figure 5: Fig 5 A and D: The schematic of the cross-section has appeared already in the previous figure. No need to repeat it here. The same holds for the schematic of the cylindrical embryo. Caption: "But, the yellow region is not an actual tissue layer and it is simply to define the position of muscles." Why do you introduce the yellow region at all? I do not think that it clarifies anything. "Deformation diagram, when left side muscles M_1 and M_2." Something seems to be missing here. Similarly in the next sentence. "the actin fiber orientation changes from the 'loop' to the 'slope'" Do the rings break up and form a helix?

      In the new version, this figure corresponds to Fig.6.

      (1) We have made revisions to these figures.

      (2) The yellow part can show the accurate location of four muscles, which is important for our model and further calculations.

      (3) We have revised this sentence in the caption of Fig. 6.

      (4) Actin rings do not change to a helix pattern, they will be only sloping.

      • Figure 6: Fig 6 A-C These panels do not go beyond Fig 5B. Fig 6D: what are these images supposed to show? They are not really graphs, but microscopy images. The caption is not helpful to understand, what the reader is supposed to see here. Fig 6F: do you really want to plot a linear curve?

      In the new version, Fig.5 and Fig.6 respectively correspond to Fig.6 and Fig.7.

      (1) Fig.6 shows the simulated images, and Fig.7 A-C is the real calculation results, they are different.

      (2) Fig.7 D can show the real condition during C. elegans late elongation, here, we would like to show the torsion of the C. elegans.

      (3) Yes, it is our result.

      Discussions concerning the biological referee questions:

      Ll 75: “how the muscle contractions couple to the acto-myosin activity" Again I find this misleading because muscle contraction relies on auto-myosin activity. Probably, you can find a better expression to refer to the activity of the actomyosin network in the epidermis. Do you propose any mechanism for how muscle contraction increases epidermal contractility? This does not seem to be the mechanism that you propose for elongation, is it?

      The actomyosin activity will not stop because of the muscle contraction. Obviously, these two processes cannot be independent. The energy released by a muscle contraction event can and must contribute to the reorganization of the actomyosin network that occurs during the elongation process. Indeed, despite the fact that the embryo elongates, the density of actin cables appears to be maintained, which automatically requires a redistribution of actin monomers. We propose a scenario in which muscle contraction increases actomyosin contractility via energy conversion. We show that after unilateral contraction there is an energy release for this once all dissipation factors are eliminated. We invite the reviewer to re-examine Figure 2 and invite biologists to seriously evaluate the density of molecular motors attached to the circumferential actin cable throughout the stretch process.

      Ll 133: "we decide to simplify the geometrical aspect because of the mechanical complexity" This is hardly a justification. Why is it appropriate?

      Yes, we would like to offer the reader the simplest modelling with a limiting technicity and a limited number of unknown parameters.

      L 135: "active strains" Why not active stress?

      The two are equivalent, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      L 170: "hyperelastic" Please, explain this term.

      It is the elasticity of very soft samples subjected to large deformations. For classic references, see the books of Ogden, Holzapfel and Goriely, all of which are mentioned in our paper.

      Major criticism

      Eq. 3 and Ll 227: "𝑝1 is the ratio between the free available myosin population and the attached ones divided by the time of recruitment" Why is the time of recruitment the same for all motors? "inverse of the debonding time" Is it the same as the unbinding rate? Why use the symbol p_2 for it? What is p_3?

      The model proposed to justify the increase in the activity of the actomyosin motors during the first phase is a mean-field model: thus all quantities are averaged: we are not considering the theory of a single molecular motor, but a collection in a dynamic environment, so we do not need stochasticity here. Equation (3) concerns the compressive pre-strain, which by definition is a quantity varying between $0$ and $1$ and $X_g=1-G$. ... The debonding time is not the same as the debonding rate. The term $p_3$ indicates saturation and is derived from the law of mass action. The good agreement with the experimental data is shown in Fig.5 (A) and (B). An equivalent model has been developed by (M. Serra et al.).

      Serra M, Serrano Nájera G, Chuai M, et al. A mechanochemical model recapitulates distinct vertebrate gastrulation modes[J]. Science Advances, 2023, 9(49)

      Ll 275: "This energy exchange causes the torsion-bending energy to convert into elongation energy, leading to a length increase during the relaxation phase, as shown in Fig.1 of Appendix 5." You have posed the puzzle of how contraction leads to elongation, and now that you resolve the puzzle, you simply say that torsion and bending energy are converted into elongation. How? Usually, if I deform an elastic object, it will return to its original configuration after releasing the external forces. Why is this not the case here?

      Furthermore, the central result of your work is presented in an Appendix!?

      We agree with the referee that an elastic object will return to its initial configuration by releasing stress, i.e. by giving up its accumulated elastic energy to the environment. But the elastic energy has to go somewhere, such as heat. We do not dare to say that the temperature of the worm increases during the muscle contractions.

      In fact, the referee's comment also assumes that full relaxation of the stresses is possible, so the object is not a multi-layered specimen and/or it is not enclosed in a box. Most living species are under stress, usually called residual stress. Our skin is under stress. Our fingerprints result from an elastic instability of the epidermis, occurring on foetal life as our brain circumvolutions or our vili. . So, it is obvious that stresses are maintained in multilayered living systems. Closer to the case of C. elegans, the existence of stresses has been demonstrated by experiments with laser ablation fractures in the first stage. The fact that the fractures open proves the existence of stress: if not, there is no opening and only a straight line.

      Ll 379: "Although a special focus is made on late elongation, its quantitative treatment cannot avoid the influence of the first stage of elongation due to the acto-myosin network, which is responsible for a prestrain of the embryo." This statement is made repeatedly through the manuscript, but I do not understand, why you could not use an initial state without pre-strain.

      This is the basic concept of hyperelasticity. The reference state must be free of stress, so we cannot evaluate the first muscle contraction without treating the first elongation stage.

      Grammar, vocabulary and writing errors

      ll 31: "the influence of mechanical stresses (...) becomes more complex to be identified and quantified" Is the influence of mechanical stress too complex or too difficult to be identified/quantified?

      We have revised it in line 31, “The superposition of mechanical stresses, cellular processes (e.g., division, migration), and tissue organization is often too complex to identify and quantify.”

      Ll 41: "The embryonic elongation of C. elegans represents an attractive model of matter reorganization without a mass increase before hatching." Maybe "Embryonic elongation of C. elegans before hatching represents an attractive model of matter reorganization in the absence of growth.".

      We have revised it in line 41.

      L 42: "It happens after the ventral enclosure (...)" Maybe "It happens after ventral enclosure (...)".

      We have revised it in line 42.

      Ll 52: "The transition is well defined since the muscle participation makes the embryo rather motile impeding any physical experiments such as laser ablation (...)" Ablation of what?

      We have revised it in line 53:The transition is well defined, because the muscle involvement makes the embryo rather motile, and any physical experiments such as laser fracture ablation of the epidermis, which could be performed and achieved in the first period (\cite{vuong2017interplay}), become difficult,.

      Ll 59: "a hollow cylinder composed of four parts (seam and dorso-ventral cells)" It is not clear, what the four parts are - in the parenthesis, two are mentioned.

      We have revised it in line 59. Fig.1 shows the whole structure, dorsal, ventral and seam cells form four parts of the epidermis.

      L 78: "several important issues at this stage remain unsettled" At which stage?

      It means the late elongation stage, we have added this information in line 78.

      Ll 85: "but how it works at small scales remains a challenge." Maybe "but how it works at small scales remains to be understood.".

      We have revised it in line 86.

      Ll 99: "the osmolarity of the interstitial fluid" The comes out of the blue. Before you only talked about mechanics, why now osmolarity? Also, the interstitial fluid is only mentioned now. It is important for the dissipative effects that you discuss later, right? If yes, then you should probably introduce it earlier.

      For a better understanding, we have change osmolarity into viscosity in line 99.

      l 120: "The cortex is composed of three distinct cells" Maybe "distinct cell types".

      Thank you, and we have revised it in line 120.

      L 121: "cytoskeleton organization and actin network configurations" What is the difference between cytoskeleton organization and actin network configuration? Also, either both should be plural or both singular, I guess.

      (1) Cytoskeleton (which involves microtubules) forms the epidermis of C. elegans embryos, and the actin network surrounds the epidermis.

      (2) Thank you for your suggestion, we have revised it in line 121.

      L 130: "which will be introduced hereafter" Maybe "which will be used hereafter".

      We have revised it in line 130.

      Ll 148: "The geometric deformation gradient" You usually denote vectors in bold face, so \chi should be bold, right? Define d_i in Eq.(1).

      Yes, we have added this information in line 147.

      L 172: "auxiliary energy density" Please, explain this term.

      We have changed "auxiliary energy density" into "associated energy density" in line 175. Energy density is the amount of energy stored in a given system or region of space per unit volume, the associated energy density in our manuscript can help us to do some calculations.

      Ll 188: "Similar active matter can be found in biological systems, from animals to plants as illustrated in Fig.3(C)-(E), they have a structure that generates internal stress/strain when growing or activity. (...)" Why such a general statement during the presentation of the results? The second part of the sentence seems to be incomplete.

      Answers: We would like to show our method is general, and can be used in many situations. We have revised the wrong sentence in line 192.

      Ll 243: "a bending deformation occurs on the left for active muscles localized on left" Maybe "bending to the left occurs if muscles on the left are activated".

      Thank you, we have revised it in line 247.

      L 250: "we assume them are perfectly synchronous" Maybe "we assume them to contract simultaneously". We have revised it in line 252.

      L 258: "the muscle and acto-myosin activities are assumed to work almost simultaneously." Before it was simultaneously, now only almost!? What does almost mean?

      Sorry, we would like to express the same meaning in theses two sentences, we have deleted the word ‘almost’ in line 261.

      Ll 294: "one can hypothesize several scenarios" After that, only one scenario is described it seems.

      Thank you, we have revised this sentence in line 299.

      L 341: "and then is more viscous than water" Maybe "and that is more viscous than water".

      We have revised it in line 345.

      L 373: "before the egg hatch" Maybe "before the embryo (or larva) hatches"?

      We have revised the sentence in line 367.

      L 409: "elephant trunk elongated" maybe "elephant trunk elongation".

      We have revised it in line 412.

      Ll 417: "As one imagines, it is far from triviality (...)" Does this remake help in any way to understand better C. elegans elongation? Also maybe "it is far from trivial".

      We have revised it in line 423.

      Ll 428: "can map the initial stress-free state B_0 to a state B_1, which reflects early elongation process" Maybe: "maps the initial stress-free state B_0 to a state B_1, which describes early elongation".

      We have revised it in line 428.

      L 429: "After in the residually stressed (...)" Maybe "Subsequently, we impose an incremental strain filed G_1 that maps the state B_1 to the state B_2, which represents late elongation".

      We have revised it in line 429.

      l 763: "Modelling details of without pre-strain case" Maybe "Case without pre-strain" or "Modelling in the absence of pre-strain" Similarly for l 784.

      We have revised them in line 763 and line 784.

      Some questions of definition and understanding

      Ll 71: "We can imagine that once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side." I do not understand the sentence. Muscle activation leads to contraction, there is nothing to imagine here. Maybe you hypothesize that the muscles are attached to the epidermis such that muscle contraction leads to epidermis deformation?

      Yes, four muscle bands are attached to the epidermis, as shown in Fig.1. The deformation does not concern only the epidermis but the whole embryo during the bending events. We have modified the sentence to avoid misunderstanding, the sentence change to “Once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side.” in line 71.

      Ll 110: "However, it is less widely known that its internal striated muscles share similarities with skeletal muscles found in vertebrates in terms of both function and structure" Is it important for what you report, whether this fact is widely known?

      Yes, it is our opinion.

      Ll 112: "the role of the four axial muscles (...) is nearly contra-intuitive" Is it or is it not? If yes, why?

      Yes it is. Muscles exert contractions, so compressive deformations. Their localization are along the axis of symmetry (up to a small deviation) so they cannot mechanically realize the expected elongation, contrary to the orthoradial actomyosin network.

      However, elongation of the C. elegans is observed experimentally, so yes, we think the result contraintuitive.

      L 116: "fully heterogeneous cylinder" What is this?

      It means that the C. elegans embryo does not have the same elastic properties in different parts (or layers).

      L 129: "will collaborate to facilitate further elongation" To facilitate or to drive? If the former, what drives elongation?

      Contraction of muscles and actin bundles together drive elongation

      Ll 141: "the deformation in each section can be quantified since the circular geometry is lost with the contractions" The deformation could also be quantified if the sections remained circular, right?

      Yes. However, circularity is lost during each bending event.

      Ll 151: "we need to evaluate the influence of the C. elegans actin network during the early elongation before studying the deformation at the late stage. So, the deformation gradient can be decomposed into: (...) where (...) is the muscle-actomyosin supplementary active strain in the late period" I thought you were now studying the early stage?

      In this part, we are outlining how we can study the whole elongation (early and late), not just the early elongation stage. To evaluate the deformation induced by the first contraction of the muscles, we need to know the state of stress of the worm prior to this event, so we also need to recover the early period using the same formalism for the same structure.

      L 160: "When considering a filamentary structure with different fiber directions" Which filamentary structure are you talking about?

      Fig.3 B shows this model and the filamentary structure, which contains the actin and muscle fibers.

      Ll 174: "When the cylinder involves several layers with different shear modulus 𝜇 and different active strains, the integral over 𝑆 covers each layer" I do not understand this sentence. Also, you should probably write 'moduli' instead of modulus.

      This implies that when integrating over the whole cross-section S, we need to take into account each layer independently with its own shear modulus and sum the results.

      L 176: "weakness of 𝜀" Do you mean \epsilon << 1?

      Yes

      Ll 178: "Given that the Euler-Lagrange equations and the boundary conditions are satisfied at each order, we can obtain solutions for the elastic strains at zero order 𝐚(𝟎) and at first order 𝐚(𝟏)." Are you thinking about different orders in an \epsilon expansion or the early and the late stages of elongation?

      Answers: Different orders are considered only for the late elongation study, the early elongation is treated exactly so do not need a correction in \epsilon.

      L 197: "fracture ablation" Please, define.

      This is an experiment in which a laser is used to make a cut in a small-scale object of study and then the internal stresses are obtained based on the morphology of the cut, please see the Ref ‘Assessing the contribution of active and passive stresses in C. elegans elongation’. We have added this definition in line 200.

      Ll 203: What motivated your choice of notations for the radii R_2'? The inner part of the cylinder is fluid? But above you wrote about a solid cylinder. Why should the inner part be compressible?

      (1) We need to define the location of actin cables, which concentrate at the outer periphery.

      (2) Our model is a hollow cylinder, and the inner part of the cylinder contains internal organs, tissues, fluids, and so on, so we consider it to be a compressible extremely soft material (Line 213).

      Ll 212: "𝑟(𝑅) is the radius after early elongation." And during?

      R is variable, r(R) depends on R but also on time t, it represents the radius of C. elegans embryos after the onset of elongation, i.e., after acto-myosin and muscle activities begin.

      L 232: \tau_p is probably t_p?

      Yes.

      L 240: "quite simultaneously" Please, be precise.

      In practice, it is difficult to define the concept of simultaneous occurrence unless there is rigorous experimental data to show it, but all we can get in the Ref ‘Remodelage des jonctions sous stress mécanique’, is that it occurs almost simultaneously, which we define as quite simultaneously.

      Ll 246: "a short period" What does short mean? Why is it relevant?

      From the experimental observations and data, we know that each contraction occurs very rapidly: a few seconds so we define a short period for one contraction.

      L 263: "the bending of the model will be increased" Is it really the model that is bent?

      Yes, the bending deformation predicted by the model, we have revised in line 266.

      Ll 265: "we observed a consistent torsional deformation (Fig.6(E)) that agrees with the patterns seen in the video" In which sense do these configurations agree? I do not see any similarity between panels D and E.

      Both show a torsion deformation.

      L 267: "torsion as the default of symmetry of the muscle axis" I do not understand.

      We discuss two cases in this research, one where the muscle follows the axis of the C. elegans in the initial configuration, and the other where the muscle has a slight angle of deflection, and we have added more information in the manuscript (line 270).

      Ll 274: "Each contraction of a pair increases the energy of the system under investigation, which is then rapidly released to the body." Do you mean the elastic energy stored in the epidermis and central part of the embryo?

      Yes, the whole body.

      Ll 284: "The activation of actin fibers 𝑔𝑎1 after muscle relaxation can be calculated and determined by our model." Have you done it?

      Yes, we can obtain the value of g_a1, and then calculate the elongation.

      Ll 286 I do not understand, why you write about mutants at this place. Am I supposed to have already understood the basic mechanism of elongation? Why do you now write about the first stage?

      I would like to show our formalism can model wild-type and mutant C.elegans, and the comparison results are good.

      L 302: "The result is significantly higher than our actual size 210𝜇𝑚." How was significance assessed? Your actual size is probably more than 210µm.

      Here, we have considered two situations, one is that the accumulated energy is totally applied to the elongation so that the length will be much larger than the experimental result of 210 µm, the length value that we have obtained by calculation. In the other case, we have considered the energy dissipation, which leads to 210 µm.

      L 433: "where 𝜆 is the axial extension due to the pre-strained" Maybe ""where 𝜆 is the axial extension due to the pre-stress".

      In our manuscript, we define the pre-strain, not the pre-stress.

      L 438: "active filamentary tensor" Please, define.

      Active filamentary tensor defines the tensor representing the activities of a cylindrical model composed of different orientations fibers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study presents careful biochemical experiments to understand the relationship between LRRK2 GTP hydrolysis parameters and LRRK2 kinase activity. The authors report that incubation of LRRK2 with ATP increases the KM for GTP and decreases the kcat. From this, they suppose an autophosphorylation process is responsible for enzyme inhibition. LRRK2 T1343A showed no change, consistent with it needing to be phosphorylated to explain the changes in G-domain properties. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions.

      Strengths:

      The strengths of the work are the very careful biochemical analyses and the interesting result for wild-type LRRK2.

      Weaknesses:

      A major unexplained weakness is why the mutant T1343A starts out with so much lower activity--it should be the same as wild-type, non-phosphorylated protein. Also, if a monomer-dimer transition is involved, it should be either all or nothing. Other approaches would add confidence to the findings.

      We thank the reviewer for these suggestions. We are aware that the T1343A has generally a lower activity compared to the wild type. Therefore, we would like to emphasize that this mutant is the only one not showing an increase in Km values after ATP treatment. Other mutants, also having lower kcat values like T1503A, still show this characteristic change in Km. Our favored explanation for the lower kcat of T1343A is that this mutation lays within a critical region, the so-called ploop, of the Roc domain and is very likely structurally not neutral. Concerning the dimer-monomer transition, we are convinced that there is more than one factor involved in this equilibrium. Most likely, including, but not limited to other LRRK2 domains (e.g. the WD40 domain), binding of co-factors (e.g. Rab29/Rab32 or 14-3-3) and membrane binding. Consistently, also with stapled peptides targeting the Roc or Cor domains we were not able to shift the equilibrium completely to the monomer (Helton et al., ACS Chem Biol. 2021, 16:2326-2338; Pathak et al. ACS Chem Neurosci. 2023, 14(11):1971-1980) We have addressed these points in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      This study addresses the catalytic activity of a Ras-like ROC GTPase domain of LRRK2 kinase, a Ser/Thr kinase linked to Parkinson's disease (PD). The enzyme is associated with gain-of-function variants that hyper-phosphorylate substrate Rab GTPases. However, the link between the regulatory ROC domain and activation of the kinase domain is not well understood. It is within this context that the authors detail the kinetics of the ROC GTPase domain of pathogenic variants of LRRK2, in comparison to the WT enzyme. Their data suggest that LRRK2 kinase activity negatively regulates the ROC GTPase activity and that PD variants of LRRK2 have differential effects on the Km and catalytic efficiency of GTP hydrolysis. Based on mutagenesis, kinetics, and biophysical experiments, the authors suggest a model in which autophosphorylation shifts the equilibrium toward monomeric LRRK2 (locked GTP state of ROC). The authors further conclude that T1343 is a crucial regulatory site, located in the P-loop of the ROC domain, which is necessary for the negative feedback mechanism. Unfortunately, the data do not support this hypothesis, and further experiments are required to confirm this model for the regulation of LRRK2 activity.

      Specific comments are below:

      • Although a couple of papers are cited, the rationale for focusing on the T1343 site is not evident to readers. It should be clarified that this locus, and perhaps other similar loci in the wider ROCO family, are likely important for direct interactions with the GTP molecule.

      To clarify this point: We, have not only have focused on this specific locus, but instead systematically mutated all known auto-phosphorylation sites with the RocCOR domain (see. supplemental information). Furthermore, it has been shown that this site, at least in the RCKW (Roc to WD40) construct, is quantitatively phosphorylated (Deniston et al., Nature 2020, 588:344-349). We are aware that the T1343 residue is located within the p-loop and that this can impact nucleotide binding capacities (see response to reviewer 1).

      We have clarified and addressed these points in a revised version of the manuscript.

      • Similar to the above, readers are kept in the dark about auto-phosphorylation and its effects on the monomer/dimer equilibrium. This is a critical aspect of this manuscript and a major conceptual finding that the authors are making from their data. However, the idea that auto-phosphorylation is (likely) to shift the monomer/dimer equilibrium toward monomer, thereby inactivating the enzyme, is not presented until page 6, AFTER describing much of their kinetics data. This is very confusing to readers, as it is difficult to understand the meaning of the data without a conceptual framework. If the model for the LRRK2 function is that dimerization is necessary for the phosphorylation of substrates, then this idea should be presented early in the introduction, and perhaps also in the abstract. If there are caveats, then they should be discussed before data are presented. A clear literature trail and the current accepted (or consensus) mechanism for LRRK2 activity is necessary to better understand the context for these data.

      We agree on the reviewer’s opinion. We have revised the introduction accordingly and added a paragraph on page 3 starting from line 27.

      • Following on the above concepts, I find it interesting that the authors mention monomeric cytosolic states, and kinase-active oligomers (dimers??), with citations. Again here, it would be useful to be more precise. Are dimers (oligomers?) only formed at the membrane? That would suggest mechanisms involving lipid or membrane-attached protein interactions. Also, what do the authors mean by oligomers? Are there more than dimers found localized to the membrane?

      There are multiple studies that have shown that LRRK2 is mainly monomeric in the cytosol while it forms mainly dimeric or higher oligomeric states at membrane (James et al., Biophys. J. 2012, 102, L41–L43; Berger et al., Biochemistry, 2010, 49, 5511–5523). However, we agree with the reviewer that it remains to be determined if the dimeric form is the most active state at the membrane, or a higher oligomeric state. Espescially since a recent study shows that LRRK2 can form active tetramers only when bound to Rab29 (Zhu et al., bioRxiv, 2022, DOI: 10.1101/2022.04.26.489605). We have clarified these points in the introduction of the revised version of the manuscript (page 3, line 27ff).

      • Fig 5 is a key part of their findings, regarding the auto-phosphorylation induced monomer formation of LRRK2. From these two bar graphs, the authors state unequivocally that the 'monomer/dimer equilibrium is abolished', and therefore, that the underlying mechanism might be increased monomerization (through maintenance of a GTP-locked state). My view is that the authors should temper these conclusions with caveats. One is that there are still plenty of dimers in the auto-phosphorylated WT, and also in the T1343A mutant. Why is that the case? Can the authors explain why only perhaps a 10% shift is sufficient? Secondly, the T1343A mutant appears to have fewer overall dimers to begin with, so it appears to readers that 'abolition' is mainly due to different levels prior to ATP treatment at 30 deg. I feel these various issues need to be clarified in a revised manuscript, with additional supporting data. Finally, on a minor note, I presume that there are no statistically significant differences between the two sets of bar graphs on the right panel. It would be wise to place 'n.s.' above the graphs for readers, and in the figure legend, so readers are not confused.

      Starting with the monomer-dimer equilibrium we are convinced that there is more than the phosphorylation of T1343 (see response to reviewer 1). Therefore a 10% shift in our assay most likely underestimate the effect seen in cells. Consistently, the T1343A mutants show a similar increase in Rab10 phosphorylation assay as the G2019S mutant. This thus shows that the identified feedback mechanism plays an important role in a cellular context. We have addressed this point in the revised manuscript on page 6, line 8ff. As long as the significance indicators in the bar charts are concerned, we agree with reviewer. In order not to overload the figure, we finally decided to include all pairwise comparisons (post-hoc tests) in the supplement.

      • Figure 6B, Westerns of phosphorylation, the lanes are not identified and it is unclear what these data mean.

      We apologize for this mistake and have added the correct labeling in the revised version of the manuscript.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      • Enumerate all 'heritable' architectures for up-to 4 constituents.

      • A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes

      • Enumerated the connectivity of the "sequence space" formed by these heritable architectures

      • Incorporating stochasticity, the authors explore stability to noise (transient perturbations)

      • A connection is made with experimental results on C elegans.

      The study is timely, as there is a renewed interest in the last decade in non-genetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance, the attention paid to how one architecture "mutates" into another. Unfortunately, the manuscript as a whole does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      Despite being foundational, this work was systematic in that (1) for the simple architectures modeled using ordinary differential equations (ODEs) with continuity assumptions, parameters that support steady states were systematically determined for each architecture and then every architecture was explored using genetic changes exhaustively, although epigenetic perturbations were not examined exhaustively because of their innumerable variety; and (2) for the more realistic modeling of architectures as Entity-Sensor-Property systems, the behavior of systems with respect to architecture as well as parameter space that lead to particular behaviors (persistence, heritable epigenetic change, etc.) was systematically explored. A more extensive exploration of parameter space that also includes the many ways that the interaction between any two entities/nodes could be specified using an equation is a potentially ever-expanding challenge that is beyond the scope of any single paper.

      Specific aspects that remain to be addressed include the application of multiple notions of heritability to real networks of arbitrary size, considering different types of equations for change of each entity/node, and classifying different behavioral regimes for different sets of parameters.

      The key contribution of the paper is an articulation of the crucial questions to ask of any regulatory architecture in living systems rather than the addressing of any question that a field has recognized as ‘open’. Specifically, through the exhaustive listing of small regulatory architectures that can be heritable and the systematic analysis of arbitrary Entity-Sensor-Property systems that more realistically capture regulatory architectures in living systems, this work points the way to constrain inferences after experiments on real living systems. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this paper will raise awareness in both communities and lead to more constrained models that minimize biases introduced by incomplete knowledge of the network, which is always the case when analyzing living systems.

      Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      The Entities-Sensors-Property (ESP) framework is based on underlying biology and not graph theory, making an ESP system not entirely equivalent to a network or graph, which is much less constrained. The terms ‘entity’, ‘sensor’, and ‘property’ were defined and justified in a previous paper (Jose, J R. Soc. Interface, 2020). While nodes of a network can be parsed arbitrarily and the relationship between them can also be arbitrary, entities and sensors are molecules or collections of molecules that are constrained such that the sensors respond to changes in particular properties of other entities and/or sensors. When considered as digraphs, sensors can be seen as vertices with positive indegree and outdegree. The ESP framework can be applied across any scale of organization in living systems and this specific way of parsing interactions also discretizes all changes in the values of any property of any entity. In short, ESP systems are networks, but not all networks are ESP systems. Therefore, the results of network theory that remain applicable for ESP systems need further investigation.

      The key utility of the ESP framework is that it is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles.

      Model

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      The Supplementary Information section titled ‘Simulation of simple ESP systems’ provides the requested detailed information and revisions to the writing provide the biologically grounded justification for parsing interacting regulators as ESP systems.

      Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying multistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      The systematic study of all arbitrary regulatory architectures is beyond the scope of this paper and, indeed, beyond the scope of any one paper. Nevertheless 225,000 arbitrary Entity-Sensor-Property systems were systematically explored and collections of parameters that lead to different behaviors provided (e.g., 78,285 are heritable). These ESP systems more closely mimic regulation in living systems than the coupled ODE-based specification of change in a regulatory architecture.

      The example questions raised here are not only difficult to answer, but subjective and present a moving target for future studies. One, ‘how easy is it to mistake one architecture for another?’. Mistaking one architecture for another clearly depends on the number of different types of experiments one can perform on an architecture and the resolution with which changes in entities can be measured to find distinguishing features. Two, ‘What fraction share a response to a particular perturbation?’. ‘Sharing a response’ also depends on the resolution of the measurement after perturbation.

      DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

      Considering every base being the unit of stored information in the DNA sequence results in the maximal possible storage capacity of a genome of given length. Any other equivalence between entity and units within the genome (e.g., 1 kb gene) will only reduce the information stored in the genome.

      Nevertheless, the claim was modified to say that the information content of an ESP system can [italics added] be more extensive than the information content of the genome. This accounts for the possibility of an organism that has an inordinately large genome such that maximal information that can be stored in a particular genome sequence exceeds that stored in a particular configuration of all the contents in a cell.

      I thank the reviewer for providing further explanation of this misunderstanding in the second round of review, which helps draw future readers to the sections in the paper that discusses this important point (also see response to Recommendations for the authors).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the author for their efforts in replying to the comments. I have updated my review accordingly; in particular, I have:

      (1) Removed my complaint that Heritability is nowhere defined

      (2) Removed issues with the presentation of the ODE model in the supplementary information.

      I thank the reviewer for raising these issues and acknowledging the improvements made.

      However, given that the manuscript is broadly unchanged from the initial one, many of my prior comments remain justified. Some key points:

      (1) The manuscript continues to be difficult to read, for the same reasons as I mentioned when reviewing the paper previously.

      (2) The utility of the "ESP" formalism is still unclear.

      • As the author notes, continuous ODEs are of course an idealisation of a system with discrete copy number.

      • However, discussing this is standard fare in any textbook dealing with chemical dynamics and stochastic processes -- see, for instance, the standard textbook by van Kampen.

      • This seems little reason to reject ODEs and implement a poorly defined formalism/simulation scheme.

      (3) The author claims that many questions raised are "beyond the scope of this study". Indeed, answering all of these questions are beyond the scope of any one study. However, as I initially wrote, the paper would be much stronger if it focused on a particular problem rather than the many vignettes depicted.

      The broad scope of this foundational paper necessitates addressing many issues, which may make it a difficult read for some readers. I hope that future work where each paper focuses on one of the aspects raised here will enable the extensive treatment of limited scope as suggested by the reviewer.

      The utility of ODEs is much appreciated and was indeed a computationally efficient way of exploring the vast space of regulatory architectures. As stated in the response to the public reviews, the Entity-Sensors-Property framework provides a biologically grounded way of parsing interacting regulators. This approach is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles.

      On a final note, on the subject of the comparison with DNA:

      Perhaps I have misunderstood something. I simply meant that comparing the "maximal information" with 4 HRAs (12.45 bits) is certainly more than the "maximal information" with 4 basepairs (8 bits), but definitely less than the "maximal information" for four 1-kb genes (4^(4000) combinations, so 8000 bits...)

      Perhaps the author means that the growth in information of HRAs is faster than exponential. If so, that should be shown and then remarked on.

      For this reason, I maintain my comment that the comparison is tenuous.

      This issue was addressed once in the results section and again in the discussion section.

      The results section states that “The combinatorial growth in the numbers of HRAs with the number of interactors can thus provide vastly more capacity for storing information in larger HRAs compared to that afforded by the proportional growth in longer genomes.”

      The discussion section states that “Despite imposing heritability, regulated non-isomorphic directed graphs soon become much more numerous than unregulated non-isomorphic directed graphs as the number of interactors increase (125 vs. 5604 for 4 interactors, Table 1). With just 10 interactors, there are >3x1020 unregulated non-isomorphic directed graphs [60] and HRAs are expected to be more numerous. This tremendous variety highlights the vast amount of information that a complex regulatory architecture can represent and the large number of changes that are possible despite sparsity of the change matrix (Fig. 3).”

      Thus, indeed as the reviewer surmises, the combinatorial explosion in information of HRAs with increases in interacting entities is faster than the proportional growth in information of genome sequence with increases in length.

      In summary, I thank the reviewers and editors for their help in improving the paper and would like to make the current manuscript the Version of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      • Enumerate all 'heritable' architectures for up to 4 constituents.

      • A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes.

      • Enumerated the connectivity of the "sequence space" formed by these heritable architectures.

      -Incorporating stochasticity, the authors explore stability to noise (transient perturbations). - A connection is made with experimental results on C elegans.

      The study is timely, as there has been a renewed interest in the last decade in nongenetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance:

      • The attention paid to how one architecture "mutates" into another, establishing the analogue of a "sequence space" for network motifs (Fig 3).

      • The distinction is drawn between permanent ("genetic") and transient ("epigenetic") perturbations that can lead to heritable changes.

      • The interplay between development, generational timescales, and physiological time (as in Fig. 5).

      I thank the reviewer for highlighting these aspects of the work.

      The manuscript would be very interesting if it focused on explaining and expanding these results. Unfortunately, as a whole, it does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      This first paper is foundational and therefore cannot be expected to solve all aspects of the problem of heredity. The work was nevertheless systematic in that (1) for the simple architectures modeled using ordinary differential equations (ODEs) with continuity assumptions, parameters that support steady states were systematically determined for each architecture and then every architecture was explored using genetic changes exhaustively, although epigenetic perturbations were not examined exhaustively because of their wide variety; and (2) for the more realistic modeling of architectures as Entity-Sensor-Property systems, the behavior of systems with respect to architecture as well as parameter space that lead to particular behaviors (persistence, heritable epigenetic change, etc.) was systematically explored. A more extensive exploration of parameter space that also includes the many ways that the interaction between any two entities/nodes could be specified using an equation is a potentially ever-expanding challenge that is beyond the scope of any single paper (see response to additional comments below).

      Specific aspects that remain to be addressed include the application of multiple notions of heritability to real networks of arbitrary size, considering different types of equations for change of each entity/node, and classifying different behavioral regimes for different sets of parameters. As is evident from this list of combinatorial possibilities, the space to be explored is vast and beyond the scope of this foundational paper.

      The key contribution of the paper is an articulation of the crucial questions to ask of any regulatory architecture in living systems rather than the addressing of any question that a field has recognized as ‘open’. Specifically, through the exhaustive listing for small regulatory architectures that can be heritable and the systematic analysis of arbitrary Entity-Sensor-Property systems that more realistically capture regulatory architectures in living systems, this work points the way to constrain inferences after experiments on real living systems. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this paper will raise awareness in both communities and lead to more constrained models that minimize biases introduced by incomplete knowledge of the network, which is always the case when analyzing living systems.

      Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      The Entities-Sensors-Property (ESP) framework is based on underlying biology and not graph theory, making an ESP system not entirely equivalent to a network or graph, which is much less constrained. The terms ‘entity’, ‘sensor’, and ‘property’ were defined and justified in a previous paper (Jose, J R. Soc. Interface, 2020). While nodes of a network can be parsed arbitrarily and the relationship between them can also be arbitrary, entities and sensors are molecules or collections of molecules that are constrained such that the sensors respond to changes in particular properties of other entities and/or sensors. When considered as digraphs, sensors can be seen as vertices with positive indegree and outdegree. The ESP framework can be applied across any scale of organization in living systems and this specific way of parsing interactions also discretizes all changes in the values of any property of any entity. In short, ESP systems are networks, but not all networks are ESP systems. Therefore, the results of network theory that remain applicable for ESP systems need further investigation. This justification is now repeated in the paper.

      The key utility of the ESP framework is that it is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles. In addition, the language of digraphs where sensors can be seen as vertices with positive indegree and outdegree has been also added to aid readers who are familiar with graph theory.

      Heritability

      The primary goal of the paper is to analyse the properties of those networks that constitute "heritable regulatory architectures". The definition of heritability is not clearly stated anywhere in the paper, but it appears to be that the steady-state of the network must have a non-zero expression of every entity. As this is the heart of the paper, it would be good to have the definition of heritable laid out clearly in either the main text or the SI.

      I have now defined the term as used in this paper early, which is indeed as surmised by the reviewer simply the preservation of the architecture and non-zero levels of all entities. I have also highlighted additional notions of heredity that are possible, which will be the focus of future work. These can range from precise reproduction of the concentration and the localization of every entity to a subset of the entities being reproduced with some error while the rest keep varying from generation to generation (as illustrated in Fig. 2 of Jose, BioEssays, 2018). Importantly, it is currently unclear which of these possibilities reflects heredity in real living systems.

      Model

      As described in the supplementary, but not in the main text, the author first chooses to endow these networks with simple linear dynamics; something like $\partial_t \vec{x} = A x - T x$, where the vector $x$ is the expression level of each entity, $A$ has the structure of the adjacency matrix of the directed graph, and $T$ is a diagonal matrix with positive entries that determines the degradation or dilution rate of each entity. From a readability standpoint, it would greatly aid the reader if the long list of equations in the SI were replaced with the simple rule that takes one from a network diagram to a set of ODEs.

      I have abridged the description by eliminating the steady state expression for every HRA as suggested and simply pointed to the earlier version of the paper for those readers who might prefer the explicit derivations of these simple expressions. An overview is now provided for going from any network diagram to a set of ODEs.

      The implementation of negative regulation is manifestly unphysical if the "entities" represent the expression level of, say, gene products. For instance, in regulatory network E, the value of the variable z can go negative (for instance, if the system starts with z= and y=0, and x > 0).

      Negative values for any entity were avoided in simulations by explicitly setting all such values to zero. This constraint has been added as a note in the section describing the equations for the change of each node/entity in each regulatory network. Specifically, the levels of each entity/sensor was set to zero during any time step when the computed value for that entity/sensor was less than zero. This bounding of the function allows for any approach to zero while avoiding negative values. I apologize for the omission of this constraint from the supplemental material in the last submission. This constraint was used in all the simulations and therefore this change does not affect any of the results presented. In this way, it is ensured that the presence of negative regulation does not lead to negative values.

      Formally, the promotion or inhibition of an entity or sensor can be modeled using any function that is either increasing (for promotion) or decreasing (for inhibition). This diversity of possibilities is one of the challenges that prevents exhaustive exploration of all functions. In fact, the use of ODEs after assuming a continuous function is an idealization that facilitates understanding of general principles but is not in keeping with the discreteness of entities or step changes in their values (amount, localization, etc.) observed in living systems. Other commonly used continuous functions include Hill functions for the rate of production of y given as xn/(k + xn) for x activating y, which increases to ~1 as x increases, or given as k/(k + xn) for x inhibiting y, which decreases to ~0 as x increases. Increasing values of ‘n’ result in steeper sigmoidal curves. In reality, levels of all entities/sensors are expected to be discretized by measurement in living systems and the form of the function for any regulation needs empirical measurement in vivo (see response to comment below).

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure of what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      While ODEs are easier to simulate and understand, they are less realistic as explained above. I have now added more explanation justifying the need for the subsequent simulation of Entity-Sensor-Property systems. I have also expanded the information provided for each aspect of the model (previously outlined in Fig. 4A and detailed within the code) in a Supplementary Information section titled ‘Simulation of simple ESP systems’.

      Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying mutlistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      The systematic study of all arbitrary regulatory architectures is beyond the scope of this paper and, as stated earlier, beyond the scope of any one paper. Nevertheless 225,000 arbitrary Entity-Sensor-Property systems were systematically explored and collections of parameters that lead to particular behaviors provided (e.g., 78,285 are heritable). These ESP systems more closely mimic regulation in living systems than the coupled ODE-based specification of change in a regulatory architecture.

      The example questions raised here are not only difficult to answer, but subjective and present a moving target for future studies. One, ‘how easy is it to mistake one architecture for another?’. Mistaking one architecture for another clearly depends on the number of different types of experiments one can perform on an architecture and the resolution with which changes in entities can be measured to find distinguishing features. Two, ‘What fraction share a response to a particular perturbation?’. ‘Sharing a response’ also depends on the resolution of the measurement of entities after perturbation.

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      The interpretations presented are conservative determinations of heritability based on the topology of the architecture. In other words, architectures that can be heritable for some set of parameters. Of course, parameter sets can be found that make any regulatory architecture not heritable. As stated earlier, exploring all parameters for even one architecture is beyond the scope of a single study because of the infinitely many ways that the interaction between any two entities can be specified.

      DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

      Considering every base being the unit of stored information in the DNA sequence results in the maximal possible storage capacity of a genome of given length. Any other equivalence between entity and units within the genome (e.g., 1 kb gene) will only reduce the information stored in the genome.

      Nevertheless, the claim has been modified to say that the information content of an ESP system can [italics added] be more extensive than the information content of the genome. This accounts for the possibility of an organism that has an inordinately large genome such that maximal information that can be stored in a particular genome sequence exceeds that stored in a particular configuration of all the contents in a cell.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript uses an interesting abstraction of epigenetic inheritance systems as partially stable states in biological networks. This follows on previous review/commentary articles by the author. Most of the molecular epigenetic inheritance literature in multicellular organisms implies some kind of templating or copying mechanisms (DNA or histone methylation, small RNA amplification) and does not focus on stability from a systems biology perspective. By contrast, theoretical and experimental work on the stability of biological networks has focused on unicellular systems (bacteria), and neglects development. The larger part of the present manuscript (Figures 1-4) deals with such networks that could exist in bacteria. The author classifies and simulates networks of interacting entities, and (unsurprisingly) concludes that positive feedback is important for stability. This part is an interesting exercise but would need to be assessed by another reviewer for comprehensiveness and for originality in the systems biology literature. There is much literature on "epigenetic" memory in networks, with several stable states and I do not see here anything strikingly new.

      The key utility of the initial part of the paper is the exhaustive enumeration of all small heritable regulatory architectures. The implications for the abundance of ‘network motifs’ and more generally any part of a network proposed to perform a particular function is that all such parts need to be compatible with heredity. This principle is generally not followed in the literature, resulting in incomplete networks being interpreted as having motifs or modules with autonomous function. Therefore, while the need for positive feedback for stability is indeed obvious, it is not consistently applied by all. For example, the famous synthetic circuit ‘the repressilator’ (Elowitz and Leibler, “A synthetic oscillatory network of transcriptional regulators”, Nature, 2000), which is presented as an example of ‘rational network design’, has three transcription factors that all sequentially inhibit the production of another transcription factor in turn forming a feedback loop of inhibitory interactions. Therefore, the contributions of the factors that promote the expression of each entity is unknown and yet essential for heritability. The comprehensive listing of the heritable regulatory architectures that are simple provide the basis for true synthetic biology where the contributing factors for observed behavior of the network are explicitly considered only after constraining for heredity. Using this principle, the minimal autonomous architecture that can implement the repressilator is the HRA ‘Z’ (Fig. 1).

      An interesting part is then to discuss such networks in the framework of a multicellular organism rather than dividing unicellular organisms, and Figure 5 includes development in the picture. Finally, Figure 6 makes a model of the feedback loops in small RNA inheritance in C. elegans to explain differences in the length of inheritance of silencing in different contexts and for different genes and their sensitivity to perturbations. The proposed model for the memory length is distinct from a previously published model by Karin et al. (ref 49).

      I thank the reviewer for appreciating this aspect of the paper.

      Strengths:

      A key strength of the manuscript is to reflect on conditions for epigenetic inheritance and its variable duration from the perspective of network stability.

      I thank the reviewer for appreciating the importance of the overall topic.

      Weaknesses:

      • I found confusing the distinction between the architecture of the network and the state in which it is. Many network components (proteins and RNAs) are coded in the genome, so a node may not disappear forever.

      I have added language to clarify the many states of a network versus its architecture (also illustrated in Fig. 4 for ESP systems). Even loss of expression below a threshold can lead to permanent loss if there is not sufficient noise to induce re-expression. For example, consider the simple case of a transcription factor that binds to its own promoter, requiring 10 molecules for the activation of the promoter and thus production of more of the same transcription factor. If an epigenetic change (e.g., RNA interference) reduces the levels to fewer than 10 molecules and if the noise in the system never results in the numbers of the transcription factor increasing beyond 10, the transcription factor has been effectively lost permanently. In this way, reduction of a regulator can lead to permanent change despite the presence of the DNA. Many papers in the field of RNA silencing in C. elegans have provided strong experimental evidence to support this assertion.

      • From the Supplementary methods, the relationship between two nodes seems to be all in the form of dx/dt = Kxy . Y, which is just one way to model biological reactions. The generality of the results on network architectures that are heritable and robust/sensitive to change is unclear. Other interactions can have sigmoidal effects, for example. Is there no systems biology study that has addressed (meta)stability of networks before in a more general manner?

      Indeed, the relationship between any two entities can in principle be modeled using any function. Extensive exploration of the behavior of any regulatory architecture – even the simplest ones – require simplifications. For example, early work by Stuart Kauffman explored Boolean networks (see ref. 10 in the paper for history and extensive explanations). However, allowing all possible ways of specifying the interactions between components of a network makes analysis both a computational and conceptual challenge.

      • Why is auto-regulation neglected? As this is a clear cause of metastable states that can be inherited, I was surprised not to find this among the networks.

      Auto-regulation in the sense of some molecule/entity ultimately leading to the production of more of itself is present in every heritable regulatory architecture. Specifically, all auto-regulatory loops rely on a sequence of interactions between two or more kinds of molecules. For example, a transcription factor (TF) binding to the promoter of its own gene sequence, resulting in the production of more TF protein is a positive feedback loop that relies on many interacting factors (transcription, translation, nuclear import, etc.) and can be considered as ‘auto-regulation’ as it is sometimes referred to in the literature. In this sense, every HRA (A through Z) includes ‘auto-regulation’ or more appropriately positive feedback loops. For example, in the HRA ‘A’, x ‘auto-regulates’ itself via y.

      • I did not understand the point of using the term "entity-sensor-property". Are they the same networks as above, now simulated in a computer environment step by step (thus allowing delays)?

      Please see response to the other reviewer regarding the need for the Entity-SensorProperty framework and how it is distinct from generic networks. Briefly, the ODE-based simple networks, while easy to analyze, are not realistic because of the assumptions of continuity. In contrast ESP systems are more realistic with measurement discretizing changes in property values as is expected in real living systems.

      • The final part applies the network modeling framework from above to small RNA inheritance in C. elegans. Given the positive feedback, what requires explanation is how fast the system STOPs small RNA inheritance. A previous model (Karin et al., ref. 49) builds on the fact that factors involved in inheritance are in finite quantity hence the different small RNAs "compete" for amplification and those targeting a given gene may eventually become extinct.

      The present model relies on a simple positive feedback that in principle can be modulated, and this modulation remains outside the model. A possibility is to add negative regulation by factors such as HERI-1, that are known to limit the duration of the silencing.

      The duration of silencing differs between genes. To explain this, the author introduces again outside the model the possibility of piRNAs acting on the mRNA, which may provide a difference in the stability of the system for different transcripts. At the end, I do not understand the point of modeling the positive feedback.

      The previous model (Karin et al., Cell Systems, 2023) can describe populations of genes that are undergoing RNA silencing but cannot explain the dynamics of silencing particular genes. Furthermore, this model also cannot explain cases of effectively permanent silencing of genes that have been reported (e.g., Devanapally et al., Nature Communications, 2021 and Shukla et al., Current Biology, 2021). Finally, the observations of susceptibility to, recovery from, and even resistance to trans silencing (e.g., Fig. 5a in Devanapally et al., Nature Communications, 2021) require an explanation that includes modulation of the HRDE-1-dependent positive feedback loop that maintains silencing across generations.

      The specific qualitative predictions regarding the relationship between piRNA-mediated regulation genome-wide and HRDE-1-dependent silencing of a particular gene across generations could guide the discovery of potential regulators of heritable RNA silencing. The equations (4) and (5) in the paper for the extent of modulation needed for heritable epigenetic change provide specific quantitative predictions that can be tested experimentally in the future. I have also revised the title of the section to read ‘Tuning of positive feedback loops acting across generations can explain the dynamics of heritable RNA silencing in C. elegans’ to emphasize the above points.

      • From the initial analysis of abstract networks that do not rely on templating, I expected a discussion of possible examples from non-templated systems and was a little surprised by the end of the manuscript on small RNAs.

      The heritability of any entity relies on regulatory interactions regardless of whether a templated mechanism is also used or not. For example, DNA replication relies on the interactions between numerous regulators, with only the sequence being determined by the template DNA. The field of small RNA-mediated silencing facilitates analysis of epigenetic changes at single-gene resolution (Chey and Jose, Trends in Genetics, 2022). It is therefore likely to continue to provide insights into heritable epigenetic changes and how they can be modulated. Unfortunately, there are currently no known cases of epigenetic inheritance where the role of any templated mechanism has been conclusively excluded. Future research will improve our understanding of epigenetic states and their modulation in terms of changes in positive feedback loops as proposed in this study and potentially lead to the discovery of such mechanisms that act entirely independent of any template-dependent entity.

      Recommendations for the authors:

      I thank the reviewers for their specific suggestions to improve the paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper has many long paragraphs that attempt to explain results, make illustrations, and give intuition. Unfortunately, these are difficult to read. It would aid the reader greatly if these were, say, converted into cartoons (even if only in the SI), or made more accessible in some other way.

      I agree with the importance of making the material accessible to readers in multiple ways. I have now added a figure with schematics in the SI titled ‘Illustrations of key concepts’ (new Fig. S2), which collects concepts that are relevant throughout the paper and might aid some readers.

      The bulk of the supplementary is currently a collection of elementary mathematics results: to whit, pages 26 to 33 of the combined manuscript carry no more information than a quick description of the general model and the diagrams in Fig 1. Similarly, pages 34 to 39 (non-zero dilution rate), and pages 39 through 58 (response to permanent changes) each express a trivial mathematical point that is more than sufficiently made with one illustrative example.

      I agree with the reviewer and have condensed these pages as suggested. I have added a pointer to the earlier version as containing further details for the readers who might prefer the explicit listing of these equations.

      Overall, the paper appears to be a collection of numerical results obtained from different models, united by uncertain terminology that is not fully defined in this paper. The most promising aspects of the paper lie either in (a) combinatorially complete enumeration of all regulatory architectures, or (b) relating experimental manipulations in C. elegans to possible underlying regulatory architectures. Focusing on one or the other might improve the readability of the paper.

      The two sections of the paper are complementary and when presented together help with the integration of concepts rather than the siloed pursuit of theory versus experimental analysis. When this work was presented at meetings before submission, it was clear that different researchers appreciated different aspects. This divergence is also apparent in the two reviews, with each reviewer appreciating different aspects. I have repeated the definitions and justifications from the earlier paper (Jose, J R Soc Interface, 2020) to provide a more fluid transition between the two complementary sections of the paper. Knowing both sides could aid in the development of models that are not only consistent with measurable quantities (e.g., anything that can be considered an entity) but are also logically constrained (e.g., entities matched with sensors while avoiding any entities that do not have a source of production – i.e., avoiding nodes with indegree = 0).

      However, having said that many results of these types are well-known in models of regulatory networks, and it is unclear what precisely warrants the new framework that the author is proposing. Indeed, it would be good to understand in what way the framework here is novel, and how it is distinguished from prior studies of regulatory networks.

      The key novelty of the work is the consideration of heritability for any regulation. With the explicit definition of the heritability for a regulatory architecture and the acknowledgement that there can be more than one notion of heredity, this paper now sets the foundation for examining many real networks in this light. I hope that the added justifications for the current framework in the revised paper strengthen these arguments. Future literature reviews on networks in general and how they address heritability or persistence will better define the prevalence of these considerations. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this work will raise awareness in both communities and lead to more constrained models that acknowledge incomplete knowledge of the network, which is always the case when analyzing living systems.

      Reviewer #2 (Recommendations For The Authors):

      Minor points/clarity

      • page 1 line 57: "transgenerational waveforms that preserve form and function" is unclear.

      This phrase was expanded upon in a previous paper (Jose, BioEssays, 2020). I have now added more explanation in this paper for completeness. The section now reads ‘For example, the localization and activity of many kinds of molecules are recreated in successive generations during comparable stages [1-3]. These recurring patterns can change throughout development such that following the levels and/or localizations of each kind of molecule over time traces waveforms that return in phase with the similarity of form and function across generations [2].’

      • page 7 line 3-6: the sentence has an ambiguous structure.

      I have now edited this long sentence to read as follows: ‘For systematic analysis, architectures that could persist for ~50 generations without even a transient loss of any entity/sensor were considered HRAs. Each HRA was perturbed (loss-of-function or gain-of-function) after five different time intervals since the start of the simulation (i.e., phases). The response of each HRA to such perturbations were compared with that of the unperturbed HRA.’

      • page 9 lines 25-27: the sentence is convoluted: are you defining epigenetic inheritance?

      I have simplified this sentence describing prior work by others (Karin et al., Cell Systems, 2023) and moved a clause to the subsequent sentence. This section now reads: ‘Recent considerations of competition for regulatory resources in populations of genes that are being silenced suggest explanations for some observations on RNA silencing in C. elegans [49]. Specifically, based on Little’s law of queueing, with a pool of M genes silenced for an average duration of T, new silenced genes arise at a rate  that is given by M = T’. I have also provided more context by preceding this section with: ‘Although the release of shared regulators upon loss of piRNA-mediated regulation in animals lacking PRG-1 could be adequate to explain enhanced HRDE-1-dependent transgenerational silencing initiated by dsRNA in prg-1(-) animals, such a competition model alone cannot explain the observed alternatives of susceptibility, recovery and resistance (Fig. 6A).’

      • page 13 lines 51-53. This last sentence of the discussion is ambiguous/unclear.

      I have now rephrased this sentence to read: ‘This pathway for increasing complexity through interactions since before the origin of life suggests that when making synthetic life, any form of high-density information storage that interacts with heritable regulatory architectures can act as the ‘genome’ analogous to DNA.’

      • Figure 2: the letters in the nodes are hard to read; the difference between full and dotted lines in the graphs also.

      I have enlarged the nodes and widened the gap in the dotted lines to make them clearer. I have also similarly edited Fig. 1 and Fig. S3 to Fig. S9.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) More explanation/description of Fig 3C and 3D would be helpful for readers, including the color code of 3D and black lines shown in both panels.

      We have added more description to the legend of Figure 3, and we have used the same color code as in Figure 2, which we now specifically note in the figure legend as well.

      (2) Differences between cranial and trunk NCC could be experimentally shown or discussed. Fig 4C shows some differences between these two populations, but in situ, results using Dlc1/Sp5/Pak3 probes in the trunk region may be informative, like Fig 5 supplement 2 for cranial NCCs.

      This is an important point. The focus of our study was on cranial neural crest cells, and the single cell sequencing data is therefore truly reflective of only cranial neural crest cells. We have not functionally tested for the roles of Dlc1/Sp5/Pak3 in trunk neural crest cells, however, based on the expression and loss-of-function phenotypes of Sp5 or Pak3 knockout mice, we predict they individually may not play a significant role. It remains plausible that Dlc1 could play an important role in the delamination of trunk neural crest cells, but we have not tested that definitively. Nonetheless, Sabbir et al 2010 showed in a gene trap mouse mutant that Dlc1 is expressed in trunk neural crest cells. Regarding the similarities and differences between cranial and trunk neural crest cells as noted by the reviewer with respect to Figure 4, it’s important to recognize the temporal differences illustrated in Figure 4. Neural crest cell delamination proceeds in a progressive wave from anterior to posterior, but also that the analysis was designed to quantify cell cycle status before and during neural crest cell delamination. We have compared cranial and trunk neural crest cells in more detail in the discussion and also speculate what might happen in the trunk based on what we know from other species.

      (3) Discussion can be added about the potential functions of Dlc1 for NCC migration and/or differentiation based on available info from KO mice.

      We have added specific details regarding the published Dlc1 knockout mouse phenotype to the discussion, particularly with respect to the craniofacial anomalies which included frontonasal prominence and pharyngeal arch hyperplasia, and defects in neural tube closure and heart development. Although the study didn’t investigate the mechanisms underpinning the Dlc1 knockout phenotype, the craniofacial morphological anomalies would be consistent with a deficit in neural crest cell delamination reducing the number of migrating neural crest cells, as we observed in our Dlc1 knockdown experiments.

      Reviewer #2 (Recommendations For The Authors):

      The authors used the (Tg(Wnt1-cre)11Rth Tg(Wnt1-GAL4)11Rth/J) line but work from the Bush lab (see Lewis et al., 2013) has demonstrated fully penetrant abnormal phenotypes that affect the midbrain neuroepithelium, increased CyclinD1 expression and overt cell proliferation as measured by BrdU incorporation. The authors should explain why they used this mouse line instead of the Wnt1-Cre2 mice (129S4-Tg(Wnt1-cre)1Sor/J) in the Jackson Laboratory (which lacks the phenotypic effects of the original Wnt1-Cre line), or a "Cre-only" control, or at a minimum explain the steps they took to ensure there were no confounding effects on their study, especially since cell proliferation was a major outcome measure.

      This is an important point, and we thank the reviewer for raising it. Yes, it has been reported that the original Wnt1Cre mice exhibit a midbrain phenotype (Ace et al. 2013). However, it has also been noted that Wnt1Cre2 can exhibit recombination in the male germline leading to ubiquitous recombination (Dinsmore et al., 2022). Therefore, to avoid any potential for bias, we used an equal number of cells derived from the Wnt1 and F10N transgenic line embryos in our scRNA-seq, and this included multiple non-Cre embryos. Our scRNA-seq analysis was therefore not dependent upon Wnt1-Cre, but also because we used whole heads not fluorescence sorted cells. However, Wnt1-Cre lineage tracing was advantageous from a computational perspective to help define cells that were premigratory and migratory in concert with Mef2c-lacZ ¬based on their expression of YFP, LacZ or both. We note these specifics more clearly in the methods.

      The Results section (line 122) states that scRNA-seq was performed on dissociated cranial tissues but the Methods section (lines 583-584) implies that whole E8.5 mouse embryos were dissociated. Which was dissociated, whole embryos or just cranial tissues? Obviously, the latter would be a better strategy to enrich for cranial neural crest, but the authors also examine the trunk neural crest. This should be clarified in the text.

      We apologize that some of the details regarding the tissue isolation were confusing and we have clarified this in the methods and the text. For the record, after isolating E8.5 embryos, we then dissected the head from those embryos, and performed scRNA-seq on dissociated cranial tissues. As the reviewer correctly noted, this approach strategically enriches for cranial neural crest cells.

      The authors do not justify why they chose a knockdown strategy, which has its limitations including its systemic injection into the amniotic cavity, its likely global and more variable effects, and its need to be conducted in culture. Why the authors did not instead use a Wnt1-Cre-mediated deletion of Dlc1, which would have been "cleaner" and more specific to the neural crest, is not clear (maybe so they could specifically target different Dcl1 isoforms?). Also, the authors use Sox10 as a marker to count neural crest cells, but Sox10 may only label a subset of neural crest cells and thus some unaffected lineages may not have been counted. The authors should mention what is known about the regulation of Dcl1 by Sox10 in the neural crest. Although the data are persuasive, a second marker for counting neural crest cells following knockdown would make the analysis more robust. Can the authors explain why they did not simply use the Mef2c-F10N-LacZ line and count LacZ-positive cells (if fluorescence signal was required for the quantification workflow, then could they have used an anti-beta Galactosidase antibody to label cells)?

      We thank the reviewer for raising these important considerations. It has previously been noted that although Wnt1-Cre is the gold standard for conditional deletion analyses in neural crest cell development, especially migration and differentiation, it is not a good tool for functional studies of the specification and delamination of neural crest cells due to the timing of Wnt1 expression and Cre activation and excision (see Barriga et al., 2015). Therefore, we chose a knockdown strategy instead, and also because it allows us to more rapidly evaluate gene function. We agree that there are limitations to the approach with respect to variability, however, this is outweighed by the ability to repeatedly perform the knockdown at multiple and more relevant temporal stages such as E7.5 (which is prior to the onset of Wnt1-Cre activity), as well as target different isoforms, and also treat large numbers of embryos for quantitative analyses. The advantage of using Sox10 as a marker for counting neural crest cells is that at the time of analysis, cranial neural crest cells are still migrating towards the frontonasal prominences and pharyngeal arches, and the overwhelming majority of these cells are Sox10 positive. Moreover, we can therefore assay every Dlc1 knockdown embryo for Sox10 expression and count the number of migrating neural crest cells. The limitation of using the Mef2c-F10N-LacZ line is that this transgenic line is maintained as a heterozygote, and thus only half the embryos in a litter could reasonably be expected to be lacZ+. But combining Sox10 and Mef2c-F10N-LacZ fluorescent immunostaining for similar analyses in the future is a great idea.

      Reviewer #3 (Recommendations For The Authors):

      The putative intermediate cells differentially express mRNAs for genes involved in cell adhesion, polarity, and protrusion relative to bona fide premigratory cells (Fig. 2E). This is persuasive evidence, but only differentially expressed genes are shown. Discussing those markers that have not yet changed, e.g. Cdh1 or Zo1 (?), would be instructive and help to clarify the order of events.

      We thank the author for this suggestion and we have provided more detail about adherens junction and tight junctions. Cdh1 is not expressed, and although Myh9 and Myh10 are expressed, we did not detect any significant changes. ZO1 is a tight junction protein encoded by the gene Tjp1, which along with other tight junctions protein encoding genes, is downregulated in intermediate NCCs as shown in the Figure 2E.

      It is unclear whether the two putative intermediate state clusters differ other than their stage of the cell cycle. Based on the trajectory analysis in Fig. 3C-D, the authors state that these two populations form simultaneously and independently but then merge into a single population. However, without further differential expression, it seems more plausible that they represent a single population that is temporarily bifurcated due to cell cycle asynchrony.

      We have addressed the cell cycle question in the discussion by noting that while it is possible the transition states represent a single population that is temporarily bifurcated due to cell cycle asynchrony, if this were true, then we should expect S phase inhibition to eliminate both transition state groups. Instead, our trajectory analyses suggest that the transition states are initially independent, and furthermore, S phase inhibition did not affect delamination of the other population of neural crest cells.

      The authors do not present an in-depth comparison of these neural crest intermediate states to previously reported cancer intermediate states. This analysis would reveal how similar the signatures are and thus how extrapolatable these and future findings in delaminating neural crest are to different types of cancer.

      We have also added more detail to the discussion to address the potential for similarities and differences in neural crest intermediate states compared to previously reported cancer intermediate states. The challenge, however, is that none of the cancer intermediate states have been characterized at a molecular level. Nonetheless, with the limited molecular markers available, we have not identified any similarities so far, but our datasets are now available for comparison with future cancer EMP datasets.

      The reduction in SOX10+ cells may be in part or wholly attributable to inhibition of proliferation AFTER delamination. Showing that there are premigratory NCCs in G2/M at ~E8.0 would bolster the argument that this population is present from the earliest stages.

      The presence of premigratory neural crest cells in G2/M is shown by the scRNA-seq data and cell cycle staining data in the neural plate border.

      Lines 248-249: The pseudo-time analysis in Fig 3C/D does indicate that the two most mature cell clusters (pharyngeal arch and frontonasal mesenchyme) may arise from common or similar migratory progenitors. However, given the decades of controversy about fate restriction of neural crest cells, the statement that "EMT intermediate NCC and their immediate lineages are not fate restricted to any specific cranial NCC derivative at this timepoint" should be toned down so as to not give the impression that they have identified common progenitors of ectomesenchyme and neuro/glial/pigment derivatives.

      We appreciate this comment, because as the reviewer noted, there has been considerable literature and debate about the fate restriction and plasticity of neural crest cells, and indeed we did not intend to imply we have identified common progenitors of ectomesenchyme and neuro/glial/pigment derivatives. That can only be truly functionally demonstrated by clonal lineage tracing analyses. Rather, we interpret our pseudo-time analyses to indicate that irrespective of cell cycle status at the time of delamination, these two populations come together with equivalent mesenchymal and migratory properties, but in the absence of fate determination in the collective of cells. This does not mean that individual cells are common progenitors of both ectomesenchyme and neuro/glial/pigment derivatives. The nuance is important, and we address this more carefully in the text.

      Lines 320-321: "...this overlap in expression was notably not observed in older embryos in areas where EMT had concluded". It is unclear whether the markers no longer overlap in older embryos (i.e. segregate to distinct populations) or are simply no longer expressed.

      The data in Figure 5 demonstrates the dynamic and overlapping expression of Dlc1, Sp5 and Pak3 in the different clusters of cells as they transition from being neuroepithelial to mesenchymal. In contrast to Sp5 and Pak3, Dlc1 is not expressed by premigratory neural crest cells but is expressed at high levels in all EMT intermediate stage neural crest cells. Later as Dlc1 continues to be expressed in migrating neural crest cells, Pak3 and Sp5 are downregulated. But the absence of overlapping expression in the dorsolateral neural plate at the conclusion of EMT coincides with their downregulation in that territory.

      In the final results section on Dlc1, the previously published mutant mouse lines are referenced as having "craniofacial malformation phenotypes". The lack of detail given on what those malformations are (assuming descriptions are available) makes the argument that they may be related to insufficient delamination less persuasive. The degree of knockdown correlates so well with the percentage reduction in migratory neural crest (Fig. 6) that one would imagine a null mutant to have a very severe phenotype.

      The inference from the reviewer is correct and indeed Dlc1 null mutant mice do have a severe phenotype. We have added more specific details regarding the craniofacial and other phenotypes of the Dlc1 mutant mice to the discussion. Of note the frontonasal prominences and the pharyngeal arches are hypoplastic in E10.5 Dlc1 mutant embryos, which would be consistent with a neural crest cell deficit. Although a deficit in neural crest cells can be caused my multiple distinct mechanisms, our Dlc1 knockdown analyses suggest that the phenotype is due to an effect on neural crest cell delamination which diminishes the number of migrating neural crest cells.

      Use the same y-axis for Fig. 4C/D

      This has been corrected.

      Fig. 6C: Please note in the panel which gene is being measured by qPCR

      This has been corrected to denoted Dlc1.

      Lines 108-117: More concise language would be appropriate here.

      As requested, we were more succinct in our language and have shortened this section.

      The SABER-FISH images are very dim. I realize the importance of not saturating the pixels, but the colors are difficult to make out.

      We thank the reviewer for pointing this out and have endeavored to make the SABER-FISH images brighter and easier to see.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaixn 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

      We would like to thank Reviewer #1 for the positive evaluation of our study.

      Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophasome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      We would like to thank Reviewer #2 for pointing out these strengths.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.

      As we explained in the initial submission, we tried to deplete PI4P in autophagosomes by multiple methods but did not succeed. In this revised manuscript, we added the result of an experiment using the PI 4-kinase inhibitor NC03 (Figure 4―figure supplement 1), which shows no significant effect on the autophagosomal PI4P level and STX17 recruitment.

      Author response image 1.

      The PI 4-kinase inhibitor NC03 failed to suppress autophagosomal PI4P accumulation and STX17 recruitment. HEK293T cells stably expressing mRuby3–STX17TM (A) or mRuby3–CERT(PHD) (B) and Halotag-LC3 were cultured in starvation medium for 1 h and then treated with and without 10 μM NC03 for 10 min. Representative confocal images are shown. STX17TM- or CERT(PHD)-positive rates of LC3 structures per cell (n > 30 cells) are shown in the graphs. Solid horizontal lines indicate medians, boxes indicate the interquartile ranges (25th to 75th percentiles), and whiskers indicate the 5th to 95th percentiles. Differences were statistically analyzed by Welch’s t-test. Scale bars, 10 μm (main), 1 μm (inset).

      • The molecular simulation study did not show whether PI4P is necessary for the STX17 TM insertion or whether other negatively charged lipids can play a similar role.

      As the reviewer suggested, we performed the molecular dynamics simulation using membranes with phosphatidylinositol, a negatively charged lipid. STX17 TM approached the PI-containing membrane but was not inserted into the membrane within a time scale of 100 ns in simulations of all five structures. This data suggests that PI4P, which is more negatively charged than PI, is required for STX17 insertion. Thus, we have included these data in Figure 5E and F and added the following text to Lines 242–244. “Moreover, if the membrane contained phosphatidylinositol (PI) instead of PI4P, STX17 approached the PI-containing membrane but was not inserted into the membrane (Figure 5E, F, Video 3)."

      Author response image 2.

      (E) An example of a time series of simulated results of STX17TM insertion into a membrane consisting of 70% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE), and 10% phosphatidylinositol (PI). STX17TM is shown in blue. Phosphorus in PC, PE and PI are indicated by yellow, cyan, and orange, respectively. Short-tailed lipids are represented as green sticks. The time evolution series are shown in Video 3. (F) Time evolution of the z-coordinate of the center of mass (z_cm) of the transmembrane helices of STX17TM in the case of membranes with PI. Five independent simulation results are represented by solid lines of different colors. The gray dashed lines indicate the locations of the lipid heads. A scale bar indicates 5 nm.

      • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

      We thank the reviewers for pointing this out. While the probe for weakly negative charges (1K8Q) labeled both immature and mature autophagosomes, the probes for intermediate charges (5K4Q and 3K6Q) and PI4P labeled only mature autophagosomes (Figure 2F, Figure 2–figure supplement 1B). Thus, we think that the autophagosomal membrane rapidly and drastically becomes negatively charged, and at the same time, PI4P is enriched. Although immature membranes may have weak negative charges, we did not examine which lipids contribute to the negative charges. Thus, we have added the following sentences to the Discussion part.

      “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283) “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

      We thank Reviewer #3 for pointing this out. Please see our above response to Reviewer #2 (Public Review).

      In addition, clear statements within the figure legends on the number of independent experimental repeats that were conducted for experiments that were quantitated are not currently present in the manuscript.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      Reviewer #1 (Recommendations For The Authors):

      This paper is well written and all experiments were conducted with a high standard. Several minor issues should be addressed before final publication.

      (1) To further confirm the charge interaction, a charge screening experiment should be performed for Fig. 2A.

      We have asked Reviewer #1 through the editor what this experiment meant and understood that it was to see the effects of high salt concentrations. We monitored the association of GFP-STX17TM with liposomes in the presence or absence of 1 M NaCl and found that it was blocked in a high ionic buffer. This data supports the electrostatic interaction of STX17 with membranes. We have included this data in Figure 2B and added the following sentences to Lines 124–126.

      “The association of STX17TM with PI4P-containing membranes was abolished in the presence of 1 M NaCl (Figure 2B). These data suggest that STX17 can be recruited to negatively charged membranes via electrostatic interaction independent of the specific lipid species.”

      Author response image 3.

      GFP–STX17TM translated in vitro was incubated with rhodamine-labeled liposomes containing 70% PC, 20% PE and 10% PI4P in the presence of 1 M NaCl or 1.2 M sucrose. GFP intensities of liposomes were quantified and shown as in Figure 1C (n > 30).

      (2) The authors claim that "Autophagosomes become negatively charged during maturation", based on experiments using membrane charge probes. Since it's mainly about the membrane, it's better to refine the claim to "The membrane of autophasosomes becomes...", which would be more precise and close to the topic of this paper.

      We would like to thank the reviewer for pointing this out. This point is valid. As recommended, we have collected the phrases “Autophagosomes become negatively charged during maturation” to “The membrane of autophagosomes becomes negatively charged during maturation” (Line 72, 118, 262, 969 (title of Figure2), 1068 (title of Figure2–figure supplyment1)).

      (3) The authors should add more discussion regarding the "specificity" for recruiting Syn17 through the charge interaction. Particularly, how Syn17 could be maintained before the closure of autophagosomes? For the MD simulations in Fig. 5, the current results don't add much to the manuscript. The cell biology experiments have demonstrated the conclusion. The authors could try to find more details about the insertion by analyzing the simulation movies. Do membrane packing defects play a role during the insertion process? A similar analysis was conducted for alpha-synuclein (https://pubmed.ncbi.nlm.nih.gov/33437978/).

      Regarding the mechanism of STX17 maintenance in the cytosol, we do not think that other molecules, such as chaperones, are essential because purified recombinant mGFP-STX17TM used in this study is soluble. However, it does not rule out such a mechanism, which would be a future study.

      In the paper by Liu et al. (PMID: 33437978), small liposomes with diameters of 25–50 nm are used. Therefore, there are packing defects in the highly curved membranes, to which alpha-synuclein helices are inserted in a curvature-dependent manner. On the other hand, autophagosomes are much larger (~1 um in diameter) and almost flat for STX17 molecules, so we think it is unlikely that STX17 recognizes the packing defect.

      Reviewer #2 (Recommendations For The Authors):

      • The two (and other) possibilities with regards to the interpretation of the negative charge/PI4P result in autophagic membranes are hoped to be discussed.

      As mentioned above, we have added the following sentences to the Discussion section. “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283)

      “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      • Fluorescence biosensors are convenient to give an overview of the intracellular distribution of various lipids, but some of them show false-negative results. For example, evectin-2-PH for PS binds to endosomes but not to the plasma membrane, even though the latter contains abundant PS. With regards to PI4P, some biosensors illuminate both the Golgi and autophagosome, while others do not appear to bind the Golgi. Moreover, fluorescence biosensors for PI(3,5)P2 and PI(3,4)P2, which are also candidates for the STX17 insertion issue, are less reliable than others (e.g., those for PI3P and PI(4,5)P2). These problems need to be considered.

      We agree with Reviewer #2 that fluorescence biosensors are not perfect for detecting specific lipids. Based on the Reviewer’s suggestion, we have included a comment on this in the Discussion section as follows (Lines 265–268).

      “Given the possibility that fluorescence lipid probes may give false-negative results, a more comprehensive biochemical analysis, such as lipidomics analysis of mature autophagosomes, would be imperative to elucidate the potential involvement of other negatively charged lipids.”

      • A negative control for the PI4P biosensor, i.e., a mutant lacking the PI4P binding ability, is better to be tested to confirm the presence of PI4P in autophagosomes.

      We would like to thank the Reviewer for this comment. We conducted the suggested experiment and confirmed that the CERT(PHD)(W33A) mutant, which is deficient for PI4P binding (Sugiki et al., JBC. 2012), was diffusely present in the cytosol and did not localize to STX17-positive autophagosomes. This data supports our conclusion that PI4P is indeed present in autophagosomes. We have included this data in Figure 3–figure supplement 2A and explained it in the text (Lines 164–166).

      Author response image 4.

      Mouse embryonic fibroblasts (MEFs) stably expressing GFP–CERT(PHD)(W33A) and mRuby3–STX17TM were cultured in starvation medium for 1 h. Bars indicate 10 μm (main images) and 1 μm (insets).

      • As a control to the molecular dynamic simulation study, STX17 TM insertion into a membrane containing other negative charge lipids, especially PI, needs to be tested. PI is a negative charge lipid that is likely to exist in autophagic membranes (as suggested by the authors' past study).

      We thank the reviewers for this suggestion. As mentioned above (Reviewer #2, Public Review), we performed the molecular dynamics simulation using membranes containing PI and added the results in Figure 5E and F and Video 3.

      • If the putative role of PI4P could be shown in the cellular context, the authors' conclusion would be much strengthened. I wonder if overexpression of PI4P fluorescence biosensors, especially those that appear to bind to the autophagosome almost exclusively, may suppress the recruitment of STX17 there.

      We would like to thank the Reviewer for asking this question. In MEFs stably overexpressing PI4P probes driven by the CMV promoter, STX17 recruitment was not affected. Thus, simple overexpression of PI4P probes does not appear to be effective in masking PI4P in autophagosomes.

      Another idea is to use an appropriate molecule (e.g., WIPI2, ATG5) and to recruit Sac1 to autophagic membranes by using the FRB-FKBP system or the like. I hope these and other possibilities will be tested to confirm the importance of PI4P in the temporal regulation of STX17 recruitment.

      We tried the FRB-FKBP system using the phosphatase domain of yeast Sac1 fused to FKBP and LC3 fused to FRB, but unfortunately, this system failed to deplete PI4P from the autophagosomal membrane.

      Reviewer #3 (Recommendations For The Authors):

      A few areas for suggested improvement are:

      (1) It would be helpful if the authors could clarify for all figures how many independent experiments were conducted for all experiments, particularly those that have quantitation and statistical analyses.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      The authors made several attempts to modulate PI4P levels on autophagosomes although understandably this proved to be challenging. A couple of suggestions are provided to address this area:

      (2) Given the reported role of GABARAPs in PI4K2a recruitment and PI4P production on autophagosomes, as well as autophagosome-lysosome fusion (Nguyen et al (2016) J Cell Biol) it would be worthwhile to assess whether GABARAP TKO cells have reduced PI4P and reduced Stx17 recruitment

      According to the Reviewer’s suggestion, we examined the localization of STX17 TM and the PI4P probe CERT(PHD) in ATG8 family (LC3/GABARAP) hexa KO HeLa cells that were established by the Lazarou lab (Nguyen et al., JCB 2016). As in WT cells, STX17 TM and CERT(PHD) were still colocalized with each other in hexa KO cells, suggesting that neither STX17 recruitment nor PI4P enrichment depends on ATG8 family proteins (note: the size of autophagosomes in HeLa cells is smaller than in MEFs, making it difficult to observe autophagosomes as ring-shaped structures). We have included this result in Figure 3–figure supplement 2(F) and explained it in the text (Lines 194–196, 198).

      Author response image 5.

      (F) WT and ATG8 hexa KO HeLa cells stably expressing GFP–STX17TM and transiently expressing mRuby3–CERT(PHD) were cultured in starvation medium. Bars indicate 10 μm (main images) and 1 μm (insets).

      (3) Can the authors try fusing Sac1 to one of the PI4P probes (CERT(PHD)) that were used, or alternatively to the c-terminus of Syntaxin 17? This approach would help to recruit Sac1 only to mature autophagosomes and could therefore prevent the autophagosome formation defect observed when fused to LC3B that targeted Sac1 to autophagosomes as they were forming. Understandably, this approach might seem a bit counterintuitive since the phosphatase is removing PI4P which is what is recruiting it but it could be a viable approach to keep PI4P levels low enough on mature autophagosomes so that Syntaxin 17 is no longer recruited. A Sac1 phosphatase mutant might be needed as a control.

      We would like to thank the Reviewer for these suggestions. We tried the phosphatase domain of yeast Sac1 or human SAC1 fused with STX17TM, but unfortunately, these fusion proteins did not deplete PI4P from autophagosomes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To resolve and further test the claim that TBI did not induce cell proliferation:

      How many brains did they analyse? Sample sizes must be provided in Figure S1.

      As per reviewer’s suggestion, we removed one of the unsupported claims shown in Figure S1. The original Figure S1 is shown below with the sample number added.

      Author response image 1.

      The authors could either improve the TBI method or the detection of cells in S-phase, mitosis or cycling. They could use PCNA-GFP or BrdU, EdU or FUCCI instead and at least provide evidence that they can detect cells in S-phase in intact brains. Timing is critical (ie cell cycle is longer than in larvae) so multiple time points should be tested. Or they could use pH3 but test more time points and rather large sample sizes. If they are not able to provide any evidence, then their lack of evidence is no evidence. The authors should consider removing pH3 and PCNA-GFP related claims instead.

      We have removed pH3 and PCNA-GFP related results and claims.

      Other unsupported claims:

      Figure 2A-C is not very clear what they are showing, but it is not evidence of astrocyte hypertrophy. It does not have cellular resolution and does not show the cell size, membranes, nor number

      (1) We have avoided the term “hypertrophy” and changed the description throughout the text to “astrocyte swelling”.

      (2) Images in the resolution of Figure 2E and 2F were able to show the enlarged soma of astrocytes, suggesting swelling.

      What is the point of using RedStinger in Figure 2?

      We used RedStinger to label the astrocyte nuclei.

      Figure S5 is not convincing, as anti-Pvr does not look localised to specific cells. Instead, it looks like uniform background. If they really think the antibody is localised, they should do double stainings with cell type specific markers. If the antibody does not work, then remove the data and the claim. They could test with RNAi knock-down in specific cell types and qRT-PCR which cells express pvr instead.

      We have removed the claim that “Pvr is predominantly expressed in astrocytes” and changed the description to “Immunostainings using the anti-Pvr antibodies revealed that endogenous Pvr expression is low in the control brains, yet significantly enhanced upon TBI. Reducing Pvr expression, but not Pvr overexpression, in astrocytes blocked the TBI-induced increase of Pvr expression (Figure S5)”.

      Figure S6: it is unclear what they are trying to show, but these data do not demonstrate that astrocytes do not engulf debris after TBI, as there isn't sufficient cellular resolution to make such claim. Firstly, they analyse one single cell per treatment. Secondly, the cell projections are not visible in these images, and therefore engulfment cannot be seen. The authors could remove the claim or visualise whether astrocytes phagocytose debris or not either using clones or with TEM.

      We agree with the reviewer that our images do not have the resolution to make this claim. We have removed Figure S6 and corresponding text description.

      On statistics:

      The statistical analysis needs revising as it is wrong in multiple places, eg Fig.1F,G,H; Figure 2D. They only use Student t-tests. These can only be used when data are continuous, distributed uniformly and only two samples are compared; if more than 2 samples, distributed uniformly, then use One-Way ANOVA and multiple comparisons tests. If data are categorical, use Chi-Square.

      We have double checked and compared the experimental group to the control separately using the Student t-tests throughout the study.

      Other points for improvement:

      Figure 2E,F: what are GFP puncta and how are they counted?

      I. Each GFP puncta looks like a little circle, likely representing a functional or dysfunctional structure. The biology of the GFP puncta is currently unkonwn.

      II. We used the ImageJ to quantify the GFP puncta:

      (1) Image- type-8 bits

      (2) Process-subtract background (Rolling ball radio:10)

      (3) Image-Adjust-Threshold-Apply

      (4) Analyze-Measure-set measurements-choose “area” “limit to threshold”-OK

      (5) Count the puncta number in the choosing area.

      (6) Get the number of puncta per square micron.

      All genotypes must be provided (including for MARCM clones), currently they are not.

      We have shown the full genotype in the corresponding legend.

      Figure 7O,P indicate on figure that these are RNAi

      We have revised the labels to RNAi in Figure 7O,P.

      Reviewer #2 (Recommendations For The Authors):

      Several typos are present in the text.

      We have read the manuscript carefully and corrected typos throughout.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the roles of the axon growth regulator Sema7a in the formation of peripheral sensory circuits in the lateral line system of zebrafish. The evidence supporting the claims of the authors is solid, although further work directly testing the roles of different sema7a isoforms would strengthen the analysis. The work will be of interest to developmental neuroscientists studying circuit formation.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, an autocrine role for Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes.

      Finally, critical controls are absent from the overexpression paradigm.

      Comments: Thank you for your valuable comments. We have analyzed the hair cell scRNA transcriptome data of zebrafish neuromasts from published works and have not identified known expression of receptors of the Sema7A protein, particularly PlexinC1 and Integrin β1 molecules (reference 4 and 15) in hair cells. This result suggests that the Sema7A protein molecule, either secreted or membrane-bound, does not possess its cognate receptor to elicit an autocrine function on the hair cells. Moreover, the GPI-anchored Sema7A lacks a cytosolic domain. So it is unlikely that Sema7A signaling directly induces the formation of presynaptic ribbons. We propose that the decrease in average number and area of synaptic aggregates likely reflects decreased stability of the synaptic structures owing to lack of contact between the sensory axons and the hair cells, which has been identified in zebrafish neuromasts (reference 38).

      Thank you for pointing missing critical control experiments. Additional control experiments (lines 333-346) with a new figure (Figure 5) have been added.

      These issues weaken the claims made by the authors including the statement that they have identified differential roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively.

      Comments: We have rephrased our statement and argue in lines 428-430 that our experiments “suggest a potential mechanism for hair cell innervation in which a local Sema7Asec diffusive cue likely consolidates the sensory arbors at the hair cell cluster and the membrane-anchored Sema7A-GPI molecule guides microcircuit topology and synapse assembly.”

      The manuscript itself would benefit from the inclusion of details in the text to help the reader interpret the figures, tools, data, and analysis.

      Comments: We have made significant revisions to the text and figures to improve clarity and consistency of the manuscript.

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below).

      Comments: Thank you for making this point. To investigate the presence of both sema7a transcripts in the hair cells of the lateral-line neuromasts, we used the Tg(myo6b:actb1EGFP) transgenic fish to capture the labeled hair cells by fluorescence-activated cell sorting (FACS) and isolated total RNA. Using transcript specific DNA oligonucleotide primers, we have identified the presence of both sema7a transcript variants in the hair cell of the neuromast. Even though we have not developed transcript specific knockout animals, we speculate that the presence of both transcript variants in the hair cell implies that they function in distinct fashion. We have changed our interpretation in lines 32-34 to “Our findings propose that Sema7A likely functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development.”

      In future we will utilize the CRISPR/Cas9 technique to target the unique C-terminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us determine the role of the membrane-bound Sema7AGPI molecule as well as the Sema7Asec in sensory arborization and synaptic assembly.

      In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations.

      Comments: Thank you for this insightful comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory-axon collective behaves around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axons associates with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Reviewer #3 (Public Review):

      Summary:

      This study demonstrates that the axon guidance molecule Sema7a patterns the innervation of hair cells in the neuromasts of the zebrafish lateral line, as revealed by quantifying gain- and loss-of function effects on the three-dimensional topology of sensory axon arbors over developmental time. Alternative splicing can produce either a diffusible or membrane-bound form of Sema7a, which is increasingly localized to the basolateral pole of hair cells as they develop (Figure 1). In sema7a mutant zebrafish, sensory axon arbors still grow to the neuromast, but they do not form the same arborization patterns as in controls, with many arbors overextending, curving less, and forming fewer loops even as they lengthen (Figure 2,3). These phenotypes only become significant later in development, indicating that Sema7a functions to pattern local microcircuitry, not the gross wiring pattern. Further, upon ectopic expression of the diffusible form of Sema7a, sensory axons grow towards the Sema7a source (Figure 4). The data also show changes in the synapses that form when mutant terminals contact hair cells, evidenced by significantly smaller pre- and post-synaptic punctae (Figure 5). Finally, by replotting single cell RNA-sequencing data (Figure 6), the authors show that several other potential cues are also produced by hair cells and might explain why the sema7a phenotype does not reflect a change in growth towards the neuromast. In summary, the data strongly indicate that Sema7a plays a role in shaping connectivity within the neuromast.

      Strengths:

      The main strength of this study is the sophisticated analysis that was used to demonstrate fine-level effects on connectivity. Rather than asking "did the axon reach its target?", the authors asked "how does the axon behave within the target?". This type of deep analysis is much more powerful than what is typical for the field and should be done more often. The breadth of analysis is also impressive, in that axon arborization patterns and synaptic connectivity were examined at 3 stages of development and in three-dimensions.

      Weaknesses:

      The main weakness is that the data do not cleanly distinguish between activities for the secreted and membrane-bound forms of Sema7a, which the authors speculate may influence axon growth and synapse formation respectively. The authors do not overstate the claims, but it would have been nice to see some additional experimentation along these lines, such as the effects of overexpressing the membrane-bound form,

      Comments: We have accepted this useful suggestion. In lines 333-346 and in Figure 5 we have demonstrated the impact of overexpressing the membrane-bound transcript variant on arborization pattern of the sensory axons.

      Some analysis of the distance over which the "diffusible" form of Sema7a might act (many secreted ligands are not in fact all that diffusible), or

      Comments: We have reported this in lines 311-317 and in Figure 4F,G.

      Some live-imaging of axons before they reach the target (predicted to be the same in control and mutants) and then within the target (predicted to be different).

      Comments: We have accepted this useful suggestion. We demonstrate the dynamics of the sensory arbors that are attracted to an ectopic Sema7Asec source in lines 325-332, Figure 4I,J; Figure 4—figure supplement 2A, and Videos 13-16.

      Clearly, although the gain-of-function studies show that Sema7a can act at a distance, other cues are sufficient. Although the lack of a phenotype could be due to compensation, it is also possible that Sema7a does not actually act in a diffusible manner within its natural context. Overall, the data support the authors' carefully worded conclusions. While certain ideas are put forward as possibilities, the authors recognize that more work is needed. The main shortcoming is that the study does not actually distinguish between the effects of the two forms of Sema7a, which are predicted but not actually shown to be either diffusible or membrane linked (the membrane linkage can be cleaved). Although the study starts by presenting the splice forms, there is no description of when and where each splice form is transcribed.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the absence of a signal.

      We therefore utilized fluorescence-activated cell sorting (FACS) to capture the labeled hair cells and isolated total RNA to perform RT-PCR using transcript specific DNA oligonucleotide primers. We identified the presence of both the secreted and the membrane-bound transcripts at four-days-old neuromasts (lines 80-84, Figure 1B-D).

      Additionally, since the mutants are predicted to disrupt both forms, it is a bit difficult to disentangle the synaptic phenotype from the earlier changes in circuit topology - perhaps the change at the level of the synapse is secondary to the change in topology.

      Comments: Thank you for the insightful suggestion. We have analyzed the relationship between the sensory arbor network topology and the distribution of postsynaptic structures (lines 384-403, Figure 6G-I). We identified that the distribution of the postsynaptic aggregates is closely associated with the topological attributes of the sensory circuit. We further clarify the potential origin of disrupted synaptic assemblies in sema7a-/- mutants in lines 380-382 and lines 417-420.

      Further, the authors do not provide any data supporting the idea that the membrane bound form of Sema7a acts only locally. Without these kinds of data, the authors are unable to attribute activities to either form.

      Comments: We have accepted this useful suggestion and have prepared the Figure 5 with the necessary details.

      The main impact on the field will be the nature of the analysis. The field of axon guidance benefits from this kind of robust quantification of growing axon trajectories, versus their ability to actually reach a target. This study highlights the value of more careful analysis and as a result, makes the point that circuit assembly is not just a matter of painting out paths using chemoattractants and repellants, but is also about how axons respond to local cues. The study also points to the likely importance of alternative splice forms and to the complex functions that can be achieved using different forms of the same ligand.

      Reviewer #4 (Public Review):

      Summary:

      The work by Dasgupta et al identifies Sema7a as a novel guidance molecule in hair cell sensory systems. The authors use the both genetic and imaging power of the zebrafish lateralline system for their research. Based on expression data and immunohistochemistry experiments, the authors demonstrate that Sema7a is present in lateral line hair cells. The authors then examine a sema7a mutant. In this mutant, Sema7a proteins levels are nearly eliminated. Importantly, the authors show that when Sema7a is absent, afferent terminals show aberrant projections and fewer contacts with hair cells. Lastly the authors show that ectopic expression of the secreted form of Sema7a is sufficient to recruit aberrant terminals to non-hair cell targets. The sema7a innervation defects are well quantified. Overall, the paper is extremely well written and easy to follow.

      Strengths:

      (1) The axon guidance phenotypes in sema7a mutants are novel, striking and thoroughly quantified.

      (2) By combining both loss of function sema7a mutants and ectopic expression of the secreted form of Sema7a the authors demonstrate the Sema7a is both necessary and sufficient to guide sensory axons

      Weaknesses:

      (1) Control. There should be an uninjected heatshock control to ensure that heatshock itself does not cause sensory afferents to form aberrant arbors. This control would help support the hypothesis that exogenously expressed Sema7a (via a heatshock driven promoter) is sufficient to attract afferent arbors.

      Comments: Thank you for the suggestion. We have added the uninjected heatshock control experiment in Figure 5 and described experimental details in the text, lines 343-345.

      (2) Synapse labeling. The numbers obtained for postsynaptic labeling in controls do not match up with the published literature - they are quite low. Although there are clear differences in postsynaptic counts between sema7a mutants and controls, it is worrying that the numbers are so low in controls. In addition, the authors do not stain for complete synapses (pre- and post-synapses together). This staining is critical to understand how Sema7a impacts synapse formation.

      Comments: Thank you for raising this issue. We believe the low average numbers of the postsynaptic punctae in control neuromasts arise from lack of formation of postsynaptic aggregates beneath the immature hair cells, which are abundant in early stages of neuromast maturation. We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We do not think analyzing the complete synapse structure will add much to our understanding of how Sema7A influence synapse formation and maintenance.

      (3) Hair cell counts. The authors need to provide quantification of hair cell counts per neuromast in mutant and control animals. If the counts are different, certain quantification may need to be normalized.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Developmental delay. It is possible that loss of Sema7a simply delays development. The latest stage examined was 4 dpf, an age that is not quite mature in control animals. The authors could look at a later age, such as 6 dpf to see if the phenotypes persist or recover.

      Comments: The homozygous sema7a-/- mutants are unviable and die at 6 dpf. We therefore restricted our analysis till 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggest that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      Issue 1: One of the most interesting conclusions in this manuscript is the function of the GPIanchored vs. secreted form of Sema7a in axon structure and synapse formation. In lines 357360 of the discussion (for example) the authors state that they have shown that the GPIanchored form of Sema7a is responsible for contact-mediated synapse formation while the secreted form functions as a chemoattractant for axon arbor structure. "We have discovered dual modes of Sema7A function in vivo: the chemoattractive diffusible form is sufficient to guide the sensory arbors toward their target, whereas the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses." However, the data do not support this conclusion. Specifically, no analysis is done showing unique expression of either isoform in hair cells and no functional analysis is done to conclusively determine which isoform is important for either phenotype.

      Comments: We have shown that both sema7a transcripts are expressed in the hair cells of four-day-old neuromasts (lines 78-84, Figure 1C,D). Ectopic expression of the sema7asec transcript variant robustly attracts the lateral-line sensory arbors toward itself, whereas ectopic expression of the sema7aGPI variant fails to impart sensory guidance from a distance, suggesting that the membrane-bound form likely participates in contact-mediated neural guidance. These experiments decisively show, for the first time in zebrafish, the dual modes of Sema7A function in vivo. However, we agree that the sema7aGPI transcript-specific knockout animal would be essential to conclusively prove that the membrane-attached form is primarily involved in forming accurate neural circuitry and contact-mediated formation and maintenance of synapses. Hence, we have very carefully stated in lines 427-428 that “the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses”. We will follow up on this suggestion in our upcoming manuscript that will incorporate transcript-specific genetic ablations.

      Though the authors present RT-PCR analysis of sema7a isoforms, it is not interpretable. The second reverse primer will also recognize the full-length transcript (from what I can gather) so it does not simply show the presence of the secreted form. Is there a unique 3'UTR for the short transcript that can be used? Additionally, for the GPI-anchored version can you use a forward primer that is not present in the short isoform? This would shed some light on the respective levels of both transcripts.

      Comments: The C-termini of the two transcript variants are distinct and we have designed distinct primers that will selectively bind to each transcript (lines 503-511). Since, we have not performed quantitative polymerase chain reaction (qPCR), relative levels of each transcript are hard to determine.

      Alternatively, and perhaps of more use, in situ hybridization using unique probes for each isoform would allow you to determine which are actually present in hair cells.

      Comments: We have tried this approach and explained the point earlier (refer to lines 203212 of this response letter).

      To decisively state that these isoforms have unique functions in axon terminal structure and synapse formation, other experiments are also essential. For example, RNA-mediated rescue analyses using both isoforms would tell you which can rescue the axonal structure and synapse size/number phenotypes. Overexpression of the GPI-anchored form, like the secreted form in Figure 4, would allow you to determine if only the secreted form can cause abnormal axon extension phenotypes. Expression of both forms in hair cells (using a myo6b promotor for example) would allow assessment of their role in presynapse formation.

      Comments: We have ectopically expressed the sema7aGPI transcript variant near the sensory arbor network and observed that Sema7A-GPI fails to impart sensory axon guidance from a distance.

      Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      For the overexpression experiments, expression of mKate alone (with and without heat shock) is also a critical control to include.

      Comments: We have incorporated two control experiments: (1) larvae injected with hsp70:sema7asec-mKate2 plasmid that were not heat shocked and (2) Uninjected larvae that were heatshocked. We think these two controls are sufficient to demonstrate that the abnormal arborization patterns are not artifacts generated due to plasmid injection and heatshocking.

      Issue 2: A second concern is the lack of data showing support cell and hair cell formation and function is unaffected. Analysis of support and hair cell number with loss of Sema7a as well as simple analyses of mechanotransduction (FM4-64) would help alleviate concerns that phenotypes are due to disrupted neuromast formation and basic hair cell function rather than a specific role for Sema7a in this process.

      Comments: We have measured the hair cell numbers in both control and sema7a-/- mutants across developmental stages. We have added this to our submitted raw data.

      We have utilized the styryl fluorophore FM4-64 to test the mechanotransduction function of the hair cells in sema7a-/- mutants. We have detailed our finding in lines 137141 and in Figure 2—figure supplement 1C,D.

      Expression analysis of Sema7a receptors would also help strengthen the argument for a specific effect on lateral line afferent axons.

      Comments: Thank you for this suggestion. Currently, we do not possess an RNA transcriptome dataset for the lateral line ganglion. This deficit limits a systematic screen for lateral-line sensory neuronal gene expressions either through antibody stains or via HCRmediated in situ techniques. In future we plan to develop an RNA transcriptome for the lateral-line ganglion and identify potential binding partners for Sema7A.

      Issue 3: The manuscript could also be improved to include more detail in some areas and less in others. In general, each section has a fairly long lead up but lacks important experimental details that would help the reader interpret the data. For example:

      Figure 1: What is the label for the lateral line axons? Is it a specific transgenic? The legend states that 3 asterisks indicate p<0.0001. What about the other asterisk combinations?

      Comments: We have clarified these issues in lines 118-121 and in lines 906-907.

      Figure 2: For the network analysis, are the traces for all axons that branch to innervate the neuromast?

      Comments: Yes, we have traced the entire arbor containing all the axons that branched from the lateral line nerve and extended toward the clustered hair cells. The three-dimensional traces depict a skeletonized representation of the arbor network.

      Can the tracing method distinguish individual axons?

      Comments: No, our goal is to understand how the axon-collective behave around the clustered hair cells during development.

      How do you know where an end is versus continued looping?

      Comments: We have categorically defined the topological attributes in lines 187-191 and in Figure 3A.

      Also, are all neuromasts similarly affected or is there a divergence based on which organ you are imaging? What neuromast was imaged in this and other figures?

      Comments: Yes, all the neuromasts in the trunk and tail regions were affected similarly by the sema7a mutation. We did not observe any region-specific phenotypic outcome. We consistently imaged the trunk neuromasts, particularly the second, third, and fourth neuromasts.

      Discussion: The short discussion failed to put these findings into context or to discuss how this unique topological arrangement of axon terminals impacts function.

      Comments: We have added a new segment, lines 432-448, in the discussion section which mentions the potential role of the topological features in arranging the distribution pattern of the postsynaptic densities and thereby potentially influencing the network’s ability to gather sensory inputs through properly placed postsynaptic aggregates.

      Can you speculate on how the looping structure may alter number of synaptic contacts per axon for instance? For this, it would be useful to know if normally the synapses form on loops versus bare terminals.

      Comments: Thank you for this insightful suggestion. We have performed detailed analysis, as mentioned in lines 384-397, to characterize the distribution of the postsynaptic densities between the two topological attributes.

      Does this looping facilitate single axons contacting more hair cells of the same polarity? Would that be beneficial?

      Comments: Looping behaviors indeed facilitate the contact between the axons and the hair cells. As we have observed, the primary topological attribute that the sensory arbor network underneath the clustered hair cells adopts is a loop. The bare terminals are predominantly projected transverse to the clustered hair cells and lack contact with them. Whether a single axon, being part of a loop, preferentially contacts hair cells of same polarity is yet to be determined. We can address this question by mosaic labeling a single axon in the arbor network and determine its association with the hair cells. We intend to do these experiments in our upcoming manuscript.

      Minor concerns:

      (1) For the stacked charts quantifying topological features, I found interpreting them challenging. Is it possible to put these into overlapping histograms or line graphs to better compare wild type to mutant directly?

      Comments: Thank you for your suggestion. We tried several ways to represent our data and found that the stacked charts optimally signify our analysis and depict the characteristic phenological differences between the control and the sema7a-/- mutants.

      (2) There are numerous strong statements throughout not directly supported by the data, e.g. lines 110-113; 206-208; 357-360 and others. These should be tempered.

      Comments: For lines 110-113, we have updated this section with new experiments and the new segment is represented in lines 115-126.

      For lines 206-208, we have updated the statement to “This result suggests that the stereotypical circuit topology observed in the mature organ may emerge through transition of individual arbors from forming bare terminals to forming closed loops encircling topological holes” in lines 225-227.

      Reviewer #2 (Recommendations For The Authors):

      The authors should be careful about making any assumptions which form of sema7a is active in NMs. Their RT-PCR demonstrates presence of both isoforms in a whole animal; however, whether they are similarly present in HCs is not investigated here.

      Comments: We have addressed this concern and have updated the manuscript with new experiments, detailed in lines 78-84.

      Also, there is an issue of translation and trafficking to the membrane with subsequent secretion. An important experiment that would address this question is expressing two sema7a isoforms in mutant HCs and asking whether this can suppress the mutant phenotype.

      Comments: Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      Presumably, sema7a is trafficked to the membrane during HC maturation. This is consistent with the authors' observation that sema7a localization is changing as NM mature. However, actin-sema7a co-labeling does not actually show whether sema7a is on the membrane. Labeling HCs with a membrane marker (transgene) would be much more convincing. Alternatively, can the authors show sema7a localization actually correlates with the presence of sensory axon terminals? They already have immunos that label both. Thus, this should be pretty straightforward.

      Comments: Thank you for these suggestions. We have addressed these issues in lines 112114, and in lines 119-126.

      Figure 2 should have a control panel, so the reduced sema7a staining can be compared to the control side-by-side.

      Comments: We have depicted Sema7A staining in control neuromasts in multiple images, including Figure 1E, Figure 1H, and in Figure 2—figure supplement 1B. We have kept the control panel in the supplementary figure due to space restrictions in Figure 2.

      Arborization topology: While I appreciate the very careful characterization of the topology for wild-type and mutant NMs, I think it would be much more informative to mark individual axons and then analyze their topology. The main reason is that the authors cannot really distinguish whether some aspects of topology they describe are really due to the densely packed overlapping terminals of multiple axons or these are really characteristic, higher order organization of individual axons. Because of this, they cannot be certain what is really happening with sema7a mutant terminals. Related to the point above. While it is clear that the overall topology is abnormal in the mutant, the authors should be careful in concluding that sema7a regulates specific aspects of it. The overall structure is probably highly interconnected perturbing one parameter would likely affect all the others.

      Comments: Thank you for this comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference number 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory axon-collective behave around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axon-collective associates itself with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Experiments with the secreted sema7a isoform would be much more informative if they were compared/contrasted to the GPI anchored isoform.

      Comments: We added a new section, lines 338-351, and a new Figure 5 to address this issue.

      The phenotype of ectopic projections in sema7a overexpression experiments is pretty dramatic, especially given the fact that these were performed in wild-type animals. Does this mean that the phenotype would be even more dramatic in sema7a mutants, as they have more bare axon terminals according to the authors' analysis. Have the authors attempted this type of experiments?

      Comments: That is an interesting suggestion. We have not tested that yet. Our guess is that in the sema7a-/- mutants, the abundant bare terminals will be far more sensitive to an ectopic source of Sema7A. But even in the sema7a-/- mutants, other chemotropic cues are still functional, which may impart certain restrictions on how many bare terminals are allowed to leave the neuromast region.

      Reviewer #3 (Recommendations For The Authors):

      (1) No raw data are shown, such that it is difficult to assess variability across animals or within animals, just the overall trends within the whole dataset. Raw data need to be shown for every measurement, at least in supplemental figures. It would also be useful to reliably show control next to mutant in the same plot, as it is a bit hard to compare across panels, which occurs in several figures.

      Comments: We have uploaded all the raw data related to each experiment.

      (2) Given the focus on the two possible forms of Sema7a, the authors should use HCR or another form of reliable in situ hybridization to show the spatiotemporal pattern of expression of each isoform.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the lack of a signal.

      (3) The authors should explain the criteria used to select the 22 embryos used to analyze the effects of expressing diffusible Sema7a.

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      (4) Although arbors were imaged in live embryos, time is never presented as a variable, so I cannot tell whether axon topology was changing as the images were collected. This needs to be clarified.

      Comments: We imaged the trunk neuromasts of both control and sema7a-/- mutant live zebrsfish larvae at 2, 3, and 4 dpf. We imaged the control and the sema7a-/- mutants of each developmental stage in parallel, within a span of two hours, and repeated these experiments multiple times to gather almost a hundred larvae from each genotype. Even though the sensory arbor network is dynamic, we believe imaging both the genotypes in parallel and within a span of two hours, and averaging almost a hundred larvae from each genotype minimize the temporal variability observed in the arbor architecture.

      (5) Ideally, the authors should use CRISPR/cas-9 to create a mutation in the C-terminus that would prevent production of the GPI-anchored form and not of the diffusible form. I understand if this is too much work to do in a short time, and would be satisfied with another experiment that could distinguish roles for at least one isoform more clearly. For instance, it would be interesting to see an analysis of how far an axon can be from a source to detect diffusible Sema7a (live imaging would be ideal for this) and then to show that the effect is different when the membrane bound form is expressed.

      Comments: Thank you for this comment. We are currently working in generating transcript specific knockout animals.

      We have added live timelapse video microscopy data in lines 330-337, Figure 4H-J, Figure 4—figure supplement 2, Video15,16.

      We have added a new segment analyzing the membrane-bound transcript variant in lines 338-351.

      Reviewer #4 (Recommendations For The Authors):

      Feedback to authors

      Overall, this is a very important and novel study. Currently the manuscript does need revision.

      Major concerns:

      (1) Controls. For the ectoptic expression of Sema7a, injection of a construct expressing Sema7a under a heatshock promoter is used to drive ectopic expression. No heatshock (injected) animal are used as a control. In many systems heatshock can impact neuron morphology. And heatshock proteins are required for normal neurite and synapse formation. Please examine sensory axons in uninjected wildtype animals with heatshock.

      Comments: We have added this control experiment in a new segment, explained in detail in lines 348-350 and Figure 5.

      (2) Synapse staining - regarding Figure 5 and related supplement

      Understanding whether guidance defects ultimately impact synapse formation is an important aspect of this paper. Therefore, is necessary to have accurate measurements of the number of complete synapses, and the overall numbers of pre- and postsynaptic components. Currently the data plotted in Figure 5 is extensive, but the way the data is laid out, the relevant comparisons are challenging to make. Perhaps include this quantification in the supplement, and move the data from the supplement to the main figure? The quantifications in the supplement are easier to follow and easier to compare between genotypes.

      Comments: We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We believe that showing only the average numbers will not reveal the changes in the distribution of the synaptic structures during development and across genotypes.

      Looking at the data itself, there seems to be some discrepancies with the synaptic counts compared to published work. While the CTBP numbers seem in order, the Maguk numbers do not. In both mutant and control there are many hair cells without any Maguk puncta/aggregates-leading to 0.75-1 postsynapses per hair cell (Figure 5 supplement H-I). Typically, the numbers should be more comparable to what was obtained for CTBP, 3-4 puncta per cells (Figure 5 supplement B-C), especially by 3-4 dpf. 3-4 CTPB or Maguk puncta per cell is based on previously published immunostaining and EM work.

      The Maguk immunostaining, especially at early stages (2-3 dpf) is challenging. To compound a challenging immunostain, around 2019 Neuromab began to outsource the purification of their Maguk antibody. After this outsourcing our lab was no longer able to get reliable label with the Maguk antibody from Neuromab.

      Millipore sells the same monoclonal antibody and it works well: https://www.emdmillipore.com/US/en/product/Anti-pan-MAGUK-Antibody-clone-K2886,MM_NF-MABN72

      I would recommend this source.

      Comments: Thank you for suggesting the new MAGUK antibody. We have utilized this new MAGUK antibody from Millipore and added a new segment in lines 389-408. In future publication we will utilize this antibody to capture the postsynaptic densities in the sensory arbors.

      The discrepancies in the postsynaptic punctae number in our control larvae may arise due to the reliability of the Neuromab MAGUK antibody. We have utilized this same antibody to stain the sema7a-/- mutants and have observed a significant decrease in MAGUK punctae number and area. On grounds of keeping parity between the control and the sema7a-/- mutants, we have decided to keep our experimental results in the manuscript.

      In addition to a more accurate Maguk label, a combined pre- and post-synaptic label is essential to understand whether synapses pair properly in the sema7a mutants. This can be accomplished using subtype specific antibodies using goat anti-mouse IgG1/Maguk and goat anti-mouse IgG2a/CTBP secondaries.

      Comments: Thank you for suggesting this. We are preparing another manuscript in which we will utilize this technique along with other suggestions to tease apart the role of distinct transcript variants in regulating neural guidance and synapse formation.

      (3) Does sema7a lesion impact the number of hair cells per neuromast? If hair cell numbers are reduced several of the quantifications could be impacted.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Could innervation just be developmentally delayed in sema7a mutants? At 4 dpf the sensory system is just starting to come online and could still be in the process of refinement. Did you look at slightly older ages, after the sensory system is functional behaviorally, for example, 6 dpf? Do the cores phenotypes (synapse defects and excess arbors) persist at 6 dpf in the sema7a mutants?

      Comments: The homozygous sema7a-/- mutants are unviable and start to die at 6 dpf. We therefore restricted our analysis until 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggests that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Minor comments to address:

      Results

      Page 4 lines 89-91. For the readers, explain why you examined levels in Sema7a in rostral and caudal hair cells. Also, this sentence is, in general, a little bit misleading-initially reading that there is no difference in Sema7a at 1.5-4 dpf.

      Comments: In lines 44-48, we explain that the hair cells in the neuromast contain mechanoreceptive hair cells of opposing polarities that help them detect water currents from opposing directions. In lines 93-106, we tested whether the Sema7A level varies between the two polarities. We observed that the Sema7A level is similar between the two polarities of hair cells, but the average Sema7A intensity increases significantly over the developmental period of 2 dpf to 4 dpf in both rostrally and caudally polarized hair cells.

      Page 10-11 Lines 263-270. What was the frequency of these 2 outcomes- out of the 22 cases with ectopic expression?

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      Discussion

      Page 14 Lines 359-360. There is not enough evidence provided in this work to suggest that the membrane attached form of Sema7a is playing a role. Both the secreted and membrane form are gone in the sema7a mutants. If the membrane attached form was specifically lesioned, and resulted in a phenotype, then there would be sufficient evidence. Currently there is strong evidence for a distinct role for the secreted form. Although the authors qualify the outlined statement with the word 'likely', stating this possibility in the discussion take-home is misleading.

      Comments: In future we will utilize the CRISPR/Cas9 technique to target the unique Cterminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us differentiate between the roles of the membrane-bound Sema7AGPI molecule and the secreted Sema7Asec in sensory arborization and synaptic assembly.

      It might be interesting in either the intro or discussion to reference the role Sema3F in axon guidance in the mouse auditory epithelium. https://elifesciences.org/articles/07830

      Comments: We have added this reference in lines 61-64.

      Figures

      Please indicate on one of your Figures where the mutation is (roughly) in the sema7a mutant (in addition to stating it in the results).

      Comments: We have added this information in Figure 2—figure supplement 1A.

      Either state or indicate in a Figure where the epitope used to make the Sema7a antibody-to show that the antibody is predicted to recognize both isoforms.

      Comments: We have stated the details of the epitope in lines 528-529.

      Figure 2-S1 what is the scale in panel A, is it different between mutant and wildtype?

      Comments: We have updated the images. New images are depicted in Figure 2—figure supplement 1A.

      Methods

      What were the methods used to quantify synapse number and area?

      Comments: We have added a new section in lines 702-708 to explain the measurement techniques.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Through an unbiased genomewide KO screen, the authors identified loss of DBT to suppress MG132-mediated death of cultured RPE cells. Further analyses suggested that DBT reduces ubiquitinated proteins by promoting autophagy. Mechanistic studies indicated that DBT loss promotes autophagy via AMPK and its downstream ULK and mTOR signaling. Furthermore, loss of DBT suppresses polyglutamine- or TDP-43-mediated cytotoxicity and/or neurodegeneration in fly models. Finally, the authors showed that DBT proteins are increased in ALS patient tissues, compared to non-neurological controls.

      Strengths:

      The idea is novel, the evidence is mostly convincing, and the data are clean. The findings have implications for human diseases.

      Reply: We thank the reviewer for the supportive comments.

      Weaknesses:

      More experiments are needed to establish the connections between DBT and autophagy. The mechanistic studies are somewhat biased, and it's unclear whether the same mechanism (i.e., AMPK-->mTOR) can be applied to TDP-43-mediated neurodegeneration. Also, some data interpretation has to be more accurate.

      Reply: We thank the reviewer for raising these questions, and we have provided additional evidence in the revised manuscript to support the model that DBTKO can enhance autophagy and induce resistance to TDP-43-associated toxicity. This is described in greater detail below.

      (1) To provide further evidence for the connection between DBT and autophagy, we have introduced additional controls. For the additional controls, we have included the AMPK shRNA and drug treatment controls (Fig.4D, Fig.S4B), and these results suggest that reducing the AMPK level renders DBTKO cells sensitive to MG132 toxicity. We also added the TSC1 shRNA and mTOR agonist treatment controls (Fig.5E, Fig.S4G), and the results show that increasing mTOR levels also make the DBTKO cells sensitive to MG132.

      (2) To further confirm the roles of AMPK and mTOR in DBTKO cells, we introduced the AMPK agonist (EX229) and mTOR inhibitors (RAD001 and AZD8055) in co-treatment experiments with MG132 and then measured cell survival (Fig.S4D, S4G). The results indicate that promoting AMPK activation or inhibiting mTOR can enhance cell resistance to MG132-induced toxicity.

      (3) Additionally, we included the overexpression and rescue experiments for DBT and analyzed the AMPK-ULK1 signaling in WT RPE1 and DBTKO cells (Fig.S5D, S5E). The results indicate that the increase of DBT can significantly reduce the phosphorylation of AMPK/ULK1 and the levels of the autophagy marker LC3II. Together, these results suggest that DBT plays an important role in autophagy.

      (4) We had shown in the original version of the manuscript that DBTKO renders cells more resistant to TDP-43-associated toxicity, similar to the tolerance of MG132-induced toxicity. Here we further show that expression of TDP-43M337V enhances the phosphorylation of AMPK in the DBTKO cells (Fig. S7A), similar to the effect of the MG132 treatment. These results suggest that the resistance of DBTKO cells to MG132 or TDP-43-assoicated toxicity shares a similar mechanism of activated the AMPK signaling.

      Reviewer #2 (Public Review):

      Summary:

      Hwang, Ran-Der et al utilized a CRISPR-Cas9 knockout in human retinal pigment epithelium (RPE1) cells to evaluate for suppressors of toxicity by the proteasome inhibitor MG132 and identified that knockout of dihydrolipoamide branched chain transacylase E2 (DBT) suppressed cell death. They show that DBT knockout in RPE1 cells does not alter proteasome or autophagy function at baseline. However, with MG132 treatment, they show a reduction in ubiquitinated proteins but with no change in proteasome function. Instead, they show that DBT knockout cells treated with MG132 have improved autophagy flux compared to wildtype cells treated with MG132. They show that MG132 treatment decreases ATP/ADP ratios to a greater extent in DBT knockout cells, and in accordance causes activation of AMPK. They then show downstream altered autophagy signaling in DBT knockout cells treated with MG132 compared to wild-type cells treated with MG132. Then they express the ALS mutant TDP43 M337 or expanded polyglutamine repeats to model Huntington's disease and show that knockdown of DBT improves cell survival in RPE1 cells with improved autophagic flux. They also utilize a Drosophila model and show that utilizing either a RNAi or CRISPR-Cas9 knockout of DBT improves eye pigment in TDP43M337V and polyglutamine repeat-expressing transgenic flies. Finally, they show evidence for increased DBT in postmortem spinal cord tissue from patients with ALS via both immunoblotting and immunofluorescence.

      Strengths:

      This is a mechanistic and well-designed paper that identifies DBT as a novel regulator of proteotoxicity via activating autophagy in the setting of proteasome inhibition. Major strengths include careful delineation of a mechanistic pathway to define how DBT is protective. These conclusions are largely justified, but additional experiments and information would be useful to clarify and extend these conclusions.

      Reply: We thank the reviewer for the supportive comments.

      Weaknesses:

      The large majority of the experiments are evaluating suppression of drug (MG132) toxicity in an in vitro epithelial cell line, so the generalizability to disease is unclear. Indeed, MG132 itself has been shown to modulate autophagy, and off-target effects of MG132 are not addressed. While this paper is strengthened by the inclusion of mouse-induced motor neurons, Drosophila models, and postmortem tissue, the putative mechanisms are minimally evaluated in these models.

      Also, this effect is only seen with MG132 treatment, at a dose that causes markedly impaired cell survival. In this setting, it is certainly plausible that changes in autophagy could be the result of differences in cell survival, as opposed to an underlying mechanism for cell survival. Additional controls would be useful to increase confidence that DBT knockdown is protective via modulation of autophagy.

      While the authors report increased DBT in postmortem ALS tissue as suggestive that DBT may modulate proteotoxicity in neurodegeneration, this point would be better supported with the evaluation of overexpression of DBT in their model.

      Reply: We appreciate the reviewer for raising these questions, and we have provided further evidence in the revised manuscript to support the proposed mechanism that DBTKO confers resistance to MG132-induced toxicity through activation of autophagy. This is discussed in greater detail below.

      (1) To provide further mechanistic analysis, we have included additional controls for the analysis of AMPK signaling in Fig. 4D and Fig. S4B. These results demonstrate that using drugs or shRNAs to reduce AMPK activity can decrease DBTKO survival. We have also shown that that an increasing the AMPK activity with an activator enhances the survival of both WT and DBTKO cells under MG132 treatment (Fig. S4D), suggesting that DBTKO cells resist MG132-induced toxicity through the activation of AMPK signaling.

      (2) We have included additional controls for the analysis of mTOR signaling in Fig. 5E and Fig. S4F. The results in Fig. 5E show that reducing TSC1 using shRNAs can decrease DBTKO survival. We also added the experiments with mTOR agonist MHY1485 as a control in Fig. S4F. These results indicate that mTOR activation can promote DBTKO cells' sensitivity to MG132 toxicity. To further confirm the importance of mTOR in DBTKO-mediated resistance to MG132 toxicity, we included the mTOR inhibitors RAD001 and AZD8055 in the co-treatment experiments with MG132, and then measured cell survival (Fig. S4G). The results show that both mTOR inhibitors can enhance cell resistance to MG132-induced toxicity (Fig. S4G). These findings suggest that mTOR inhibition is required for DBTKO-mediated cell survival under MG132 treatment.

      (3) To further test the hypothesis that DBT knockdown is protective via modulation of autophagy, we have introduced the overexpression of DBT and the rescue of DBT in DBTKO cells to analyze the AMPK signaling that regulates autophagy (Fig. S5E). The results demonstrate that overexpression of DBT significantly reduced the phosphorylation of AMPK and ULK1 (Fig. S5E). In the rescue experiment, the results mirror those of the overexpression experiment, showing a significant reduction in the phosphorylation of AMPK and ULK1 (Fig. S5E). We also analyzed the autophagy marker LC3II in both the overexpression and rescue experiments, and the results indicate that increasing the DBT level specifically reduces the LC3II level (Fig. S5D). These results support the model that loss of DBT promotes the activation of autophagy.

      (4) To test the hypothesis that DBT may modulate proteotoxicity in neurodegeneration, we included the studies with TDP-43M337V and found that the expression of the mutant TDP43 enhanced the phosphorylation of AMPK in the DBTKO cells (Fig. S7A), consistent with the observations made with MG-132 treatment. Together with other findings in the manuscript, these results indicate that DBTKO can sensitize the activation of the AMPK signaling and confer the resistance to TDP-43-associated toxicity.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) MG132 is known to inhibit the proteasome, not autophagy/lysosomes. Indeed, several studies showed that it promotes autophagy in multiple types of cells, potentially via a compensatory mechanism. However, this study chose a condition in which MG132 causes cell death, accompanied by inhibition of autophagy (Figure 3). This is concerning because it suggests that autophagy inhibition in this case could be a rather late effect (potentially because cells are dying) or a side effect of MG132. In either case, it would not be a direct result of proteasomal inhibition. Thus, firstly, it is better to say MG132-mediated "cytotoxicity" rather than "proteotoxicity." Secondly, the authors should find (if possible) an early time point in which autophagy is enhanced and see if DBT KO further enhances it at this time point. In addition, the authors should use a well-established way to induce autophagy (e.g., rapamycin) and see if this suppresses cell death after 48 hours of MG132 treatment. These experiments will tell whether the observed MG132-induced autophagy defects are the cause or consequence of cells dying.

      Reply: We thank the reviewer for raising the insightful question, and we have added additional evidence in the revised manuscript to support the model that DBTKO can enhance autophagy and confer resistance to MG132-induced toxicity.

      Based on the reviewer's suggestion, we have included a time-dependent analysis of MG132 treatment, focusing on the autophagy marker LC3II. Our findings reveal that, after a 2-hour MG132 treatment, the LC3II levels were comparable between WT and DBTKO cells (Fig.S5B). However, upon extending the MG132 treatment to 4-8 hours, DBTKO cells exhibited a rapid increase in LC3II levels compared to the WT control (Fig.S5B). This suggests that DBTKO demonstrates an early and swift activation of autophagy to mitigate MG132-induced proteotoxicity compared to the WT control.

      Additionally, we introduced two autophagy-inducing drugs, RAD001 (Everolimus derived from rapamycin) and AZD8055 (mTOR inhibitor). These drugs were co-administered with MG132, and cell survival was assessed. Our results indicate that both RAD001 and AZD8055 treatments enhanced cell resistance to MG132-induced toxicity compared to the WT control (Fig.S4G). Furthermore, we compared the survival rates of DBTKO cells, RAD001 treatment, and AZD8055 treatment under MG132 conditions. The outcomes showed no significant difference in survival rates between DBTKO cells and WT cells treated with RAD001 or AZD8055 (Fig.S4G), indicating that inhibition of autophagy induced cell resistance to MG132 toxicity similar to that induced by DBTKO. Together, these results demonstrate that DBTKO promotes autophagy, conferring cell resistance to MG132-induced toxicity.

      Finally, we have followed the reviewer’s suggestion and changed the phrase of MG132mediated “proteotoxicity” to cytotoxicity or simply toxicity throughout the text.

      (2) All autophagy-related data are Western blots. The authors should provide fluorescent microscopy or TEM images for readers to visualize autophagosomes, lysosomes, etc. in WT and KO cells, with or without MG132 treatments.

      Reply: We appreciate the reviewer's suggestion regarding the visualization of autophagy. In response, we utilized a specific antibody to stain the autophagy marker LC3 in both RPE1 and DBTKO cells. The results indicate no discernible difference in LC3 signal between RPE1 and DBTKO cells under normal conditions (Fig. S5C). However, upon treatment with MG132, the LC3 signal in DBTKO cells exhibited a significant increase compared to RPE1 cells (Fig. S5C). This finding suggests that DBTKO can enhance autophagy, thereby mitigating MG132-induced toxicity.

      (3) The mechanistic study is hypothesis-driven. Thus, it's unclear whether the AMPK/ULK/mTOR axis is sufficient to rescue MG132-mediated cell death. At most, the authors' findings can be "a" mechanism, but not "the" mechanism. Are AMPK/ULK/mTOR-related regulator proteins identified from the CRISPR-KO screen? Can AMPK/ULK activation or mTOR inhibition via ways other than DBT loss suppress MG132-mediated cell death? What are other genes identified from the screen? That being said, Figure 4D and 5E suggested that loss of AMPK or a component of the mTOR signaling pathway, TSC1, completely abolishes the effect of DBT loss on MG132-mediated cell death--these data somewhat indicates that AMPK and mTOR are necessary. But it raises two issues. First, if mTOR is necessary, then how about ULK? The authors stated that ULK and mTOR are two parallel mechanisms downstream of AMPK. Secondly, one thing is missing: the effect of AMPK or TSC1 loss alone on cell death, i.e., do AMPK/TSC1 RNAi cause cytotoxicity in the absence of MG132? The same issue is also with the drug experiments in SFig. 4.

      Reply: We appreciate the reviewer's questions and have addressed them by additional clarification and experiments as detailed below.

      (1) Our original CRISPR screen is a clonal screen; i.e., we selected a few single surviving cell colonies and identified the strongest suppressor colony as the DBTKO strain, which was then subjected to further analysis. Therefore, we do not have a list of other suppressor genes at this time, but this will be in our future plans.

      (2) To test whether modulation of mTOR signaling would yield an effect on MG132-induced cell death similar to that of DBT loss, we employed mTOR inhibitors (RAD001 and AZD8055) in co-treatment with MG132 and observed that WT cells exhibited resistance to MG132-induced toxicity (Fig.S4G). We found that the survival rate under the mTOR inhibitor treatment was comparable to that of the DBTKO cells, indicating that mTORspecific inhibitors can be utilized to enhance cell survival and suppress MG132-mediated cell death.

      (3) Based on our results, we found that ULK1 undergoes phosphorylation at S317 in DBTKO cells treated with MG132 (Fig.5B). Additionally, overexpression of DBT or the rescue of DBT into DBTKO cells has been shown to reduce the phosphorylation of ULK1(Fig.S5E). Previous studies have demonstrated that AMPK can directly interact with ULK1 and promote phosphorylation at the S317 site 1. Consistent with this, we observed that activation of AMPK promotes the phosphorylation of ULK1 at S317 in DBTKO cells under MG132 toxicity (Fig.5B and S5E). These findings suggest that the AMPK-ULK1 signaling pathway is activated in DBTKO cells under MG132 treatment. Previous studies have established that mTOR can inhibit the AMPK-ULK1 interaction and reduce the phosphorylation of ULK1 at S317 1. Similarly, we observed a specific reduction in the phosphorylation of ULK1 at the S317 site under MHY1483 treatment (mTOR activator) in DBTKO cells (Fig.S4E). Furthermore, we found that mTOR is inhibited in DBTKO cells under MG132 treatment and that using an mTOR inhibitor can promote cell resistance to MG132 toxicity (Fig.S4G). These results indicate that mTOR is inhibited in the DBTKO cells and that the AMPK-ULK1 axis plays a key role in these cells’ resistance to MG132 toxicity.

      (4) We have added the survival data for AMPK or TSC1 loss alone in the cell survival analysis (Fig.4D, Fig.5E, Fig.S4B, and S4F). The loss of AMPK or TSC1 resulted in a reduction in cell survival, suggesting that the absence of AMPK or TSC1 can induce cytotoxicity. The results also demonstrate that the co-treatment of MG132 with the loss of AMPK or TSC1 further reduced the survival rate compared to the loss of AMPK or TSC1 alone.

      (4) It's unclear whether the AMPK/mTOR axis underpins DBT-mediated protection against TDP43/polyQ toxicity. The authors should perform similar experiments as they did with MG132.

      Reply: To address this question, we introduced the expression of TDP-43M337V and analyzed the AMPK-ULK1 axis in both WT and DBTKO cells (Fig. S7A). The results indicate that the expression of TDP-43M337V can lead to the activation of the AMPK-ULK1 signaling in the DBTKO cells but not in the WT cells (Fig. S7A). Notably, a significant reduction in TDP-43M337V levels was observed in the DBTKO group (Fig. 6C and Fig. S7A), and DBTKO was found to facilitate TDP-43M337V clearance (Fig. 6D). Furthermore, we observed that DBTKO promoted cell survival under TDP-43M337V toxicity conditions (Fig. 6A-B). These results collectively suggest that DBTKO can activate the AMPK-ULK1 axis, enhancing autophagy and providing protection against TDP-43 toxicity.

      Minor concerns:

      (1) The glucose data in Sfig. 3 need to be correlated with a cellular ATP measurement. Also, can the authors add ATP instead of glucose to the cell culture media and see if it suppresses DBT's effect on MG132-mediated cell death? Can low glucose or starvation, which reduces ATP, activate AMPK and suppress MG132-mediated cell death?

      Reply: Given the relative difficulty of manipulating ATPs in cell cultures, we have followed this suggestion and tested the conditions of starvation to reduce ATP. We observed that starvation enhanced the resistance of RPE1 cells to MG132-induced toxicity (Fig. S4C). This result is consistent with our other findings that reduction in ATP promotes cell resistance to MG132induced toxicity.

      (2) Figure 4B and C: Comparing WT cells with vs. without MG132, ATP is down but pAMPK is not up. Why is that?

      Reply: We think this is because the reduction of ATP in WT cell with MG132 is too limited to induce a strong energy deficiency to induce upregulation of pAMPK under the experimental conditions. In our observations, it took the synergistic action of both MG132 treatment and loss of DBT to induce a significant increase in pAMPK. These observations are consistence with the cell survival outcome.

      (3) Some "proteotoxicity" should be changed to "cytotoxicity" or "cell death," see Major Concern #1.

      Reply: We have followed the suggestion and made revisions to our manuscript accordingly. In most instances, we have replaced the term "proteotoxicity" with "cytotoxicity" or “toxicity”.

      (4) Table S1: what does "ALS Pathology--C9orf72 positive" mean? C9ALS usually have TDP-43 pathology.

      Reply: A subset of the patient tissues in this study have been analyzed genetically and tested positive for carrying the C9orf72 hexanucleotide repeat expansion (HRE) mutation, whereas a different subset of the tissues have been shown to have TDP-43 proteinopathy. We have changed the terminology in Table S1 to better describe these categories.

      Reviewer #2 (Recommendations For The Authors):

      (1) To evaluate whether rescue of cell survival by DBT knockdown is through activating autophagy, it would be reasonable to also use another proteasome inhibitor such bortezomib as an inhibitor for key experiments evaluating autophagy. Additonally, if DBT knockout is protective through activating autophagy, an important experiment to confirm this would be to activate autophagy in MG132-treated WT cells through other methods such as overexpressing AMPK, to see if this can rescue cell death.

      Reply: We appreciate the reviewer for bringing up these questions. In response, we conducted an assay testing the effect of bortezomib treatment on DBTKO cell survival rates. The results from the bortezomib assay were consistent with those of MG132, showing that DBTKO cells consistently demonstrated better survival rates than the WT control cells (Fig. S1C and D).

      To further investigate whether AMPK is a key signaling pathway for DBTKO-mediated protection against MG132-induced toxicity, we included the test using the AMPK agonist EX229, also known as compound 9912,3. The results indicate that this agonist promotes resistance to MG132-induced toxicity in both WT and DBTKO cells (Fig. S4D).

      (2) There is marked variability in cell survival assays (some of which may be attributed to using different techniques - calcein-AM vs. crystal violet). For example, for untreated cells, Figure 1 shows 20% control cell survival and 100% survival with DBT shRNA at 48 hours, but Figure 3G shows under 5% control cell survival and less than 30% cell survival with DBT KO. This variability should be addressed so comparison can be made between figures.

      Reply: We have addressed this issue by using the same type of controls in the cell survival analysis in different figures. This revision has reduced the apparent variability in the survival data. For example, we employed the same type of control to normalize the results in Figure 1, and now the values of survival rates align with other results. There is still variability among different sets of experiments since the cell survival is quite sensitive to experimental conditions such as cell states and drug batch effects. Overall, the trends of the difference among the samples remain consistent throughout the studies.

      (3) The graphs have very large circles denoting data points, making it difficult to see what the actual values are. Decreasing the size of the circles will make the graphs easier to read.

      Reply: We thank the reviewer for pointing out this tissue, and we have decreased the sizes of the graph data points.

      (4) Figure 1E and F seem to show some data points for cell survival as greater than 100%. This should not be possible and should be corrected.

      Reply: In these original figures, the appearance that some data points are greater than 100% is due to the setting of the average value as 100% in the control. To address this issue, we have normalized all the datapoints to the highest value of the untreated WT cells. Now all the survival rates are below 100%, while the relative differences remain the same.

      (5) Figure 2 lacks information about how ubiquitinated proteins and protein aggregates were quantified in Figure 2 and B. How many cells were quantified? What does the y-axis denote in the graphs? Is this average number of cells per experiment or does each point an individual cell?

      Reply: Regarding the quantification of Figure 2B, we quantified all the cells in the image. Each group consisted of 28 cells, and each data point represents the average fluorescence intensity of 7 cells. The y-axis indicates fluorescence intensity per cell.

      (6) For bafilomycin A1 treatment, it is recommended to ensure that autophagy flux is saturated with this dose and treatment time. An experiment adding another autophagosome-lyososome fusion blocker (for example vinblastine) to show that the chosen bafilomycin A1 treatment is saturating would be reasonable.

      Reply: We appreciate the reviewer's suggestion. As suggested, we conducted the experiment using vinblastine, and the results were similar to the bafilomycin A1 treatment (Fig. S5A). Under both BafA1 and Vin drug treatments, there was an accumulation of LC3II.

      (7) It is also recommended to show that the defined pathogenic mechanism for DBT knockout to rescue toxicity is present in other models (for example motor neurons or Drosophila). Additional experiments to evaluate for activation of autophagy by knocking down DBT in these models would strengthen this paper.

      Reply: While we appreciate the reviewer for their valuable suggestions, the modulation of DBT in motor neurons or Drosophila are laborious and time-consuming. Given the limited time for revision, we have opted to strengthen our data on autophagy in the same cell model system, in which most mechanistic studies were carried out. Therefore, we conducted a DBT rescue experiment in the DBTKO cells used for most of the experiments in the study. The results demonstrate that the rescue of DBT leads to a reduction in the LC3II level (Fig. S5D). Additionally, we observed a significant decrease in the phosphorylation of AMPK and ULK1 compared to DBTKO cells (Fig. S5E). These findings suggest that DBTKO can promote autophagy through the AMPK-ULK1 pathway. However, rescuing DBT in DBTKO cells diminishes this phenotype and renders them more sensitive to MG132 toxicity.

      (8) For fly eye degeneration experiments, it is recommended to also evaluate other markers of eye degeneration besides pigment, as different fly strains can have different shades of color, and necrotic patches can also alter color. Additional assessment of neurodegeneration (for example by evaluation of ommatidial structure, bristles, necrotic patches, and retinal collapse is recommended).

      Reply: We appreciate the reviewer for providing the valuable suggestions. However, detailed morphological examination of Drosophila eyes would require electron microscopy experiments, which would take a significant effort and may be better conducted in future studies. Here in the present study, we have included light microscope images of Drosophila ommatidium eyes as well as the data from pigment analysis, which provide solid support for the verification of the main conclusion using Drosophila as one of the ancillary experimental models.

      (9) In order to justify the supposition that increased DBT is pathogenic in ALS, DBT overexpression experiments would be useful to show that DBT does indeed misregulate proteostasis. Additionally, it would be useful to test other markers of autophagy with immunoblotting in the postmortem tissue cell lysates. For example, is there more evidence of impaired autophagy in the samples with high expression of DBT compared to low expression of DBT?

      Reply: We thank the reviewer for the suggestions. Due to the limited availability of human tissue samples, we have performed an overexpression experiment to evaluate the effects of the high expression of DBT in both RPE1 and DBTKO cells as suggested by the reviewer (Fig. S5E). The results indicate that overexpression of DBT reduces the phosphorylation of AMPK and ULK1, suggesting a reduction in the AMPK-ULK1 autophagy signaling pathway under DBT overexpression, in support of our model.

      (10) The quantification for immunofluorescent staining of DBT in postmortem spinal cords (Figure 7B) shows that each cell was treated as a biological replicate. Another graph to show averaged fluorescent intensity for each patient sample would be appropriate, and statistical analysis should use each patient sample as an "n."

      Reply: We have added the new graph to the revised Figure 7B, with the “n” in the statistical representing the number of patient samples.

      (11) For materials and methods, all drug concentrations and treatment times (including bafilomycin) should be listed.

      Reply: We have added all concentrations and treatment times in the section of Materials and Methods.

      References

      (1) Kim, J., Kundu, M., Viollet, B., and Guan, K.L. (2011). AMPK and mTOR regulate autophagy through direct phosphorylaFon of Ulk1. Nat Cell Biol 13, 132-141. 10.1038/ncb2152.

      (2) Lai, Y.C., Kviklyte, S., Vertommen, D., LanFer, L., Foretz, M., Viollet, B., Hallen, S., and Rider, M.H. (2014). A small-molecule benzimidazole derivaFve that potently acFvates AMPK to increase glucose transport in skeletal muscle: comparison with effects of contracFon and other AMPK acFvators. The Biochemical journal 460, 363-375. 10.1042/BJ20131673.

      (3) Xiao, B., Sanders, M.J., Carmena, D., Bright, N.J., Haire, L.F., Underwood, E., Patel, B.R., Heath, R.B., Walker, P.A., Hallen, S., et al. (2013). Structural basis of AMPK regulaFon by small molecule acFvators. Nature communicaFons 4, 3017. 10.1038/ncomms4017.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in fundamental and clinical aspects of consciousness.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Luppi et al., apply the recently developed integrated information decomposition to the question how the architecture of information processing changes when consciousness is lost. They explore fMRI data from two different populations: healthy volunteers undergoing reversible anesthesia, as well as from patients who have long-term disorders of consciousness. They show that, in both populations, synergistic integration of information is disrupted in common ways. These results are interpreted in the context of the SAPHIRE model (recently proposed by this same group), that describes information processing in the brain as being composed of several distinct steps: 1) gatekeeping (where gateway regions introduce sensory information to the global synergistic workspace where 2) it is integrated or "processed" before 3) by broadcast back to to the brain.

      I think that this paper is an excellent addition to the literature on information theory in neuroscience, and consciousness science specifically. The writing is clear, the figures are informative, and the authors do a good job of engaging with existing literature. While I do have some questions about the interpretations of the various information-theoretic measures, all in all, I think this is a significant piece of science that I am glad to see added to the literature.

      One specific question I have is that I am still a little unsure about what "synergy" really is in this context. From the methods, it is defined as that part of the joint mutual information that is greater than the maximum marginal mutual information. While this is a perfectly fine mathematical measure, it is not clear to me what that means for a squishy organ like the brain. What should these results mean to a neuro-biologist or clinician?

      Right now the discussion is very high level, equating synergy to "information processing" or "integrated information", but it might be helpful for readers not steeped in multivariate information theory to have some kind of toy model that gets worked out in detail. On page 15, the logical XOR is presented in the context of the single-target PID, but 1) the XOR is discrete, while the data analyzed here are continuous BOLD signals w/ Gaussian assumptions and 2) the XOR gate is a single-target system, while the power of the Phi-ID approach is the multi-target generality. Is there a Gaussian analog of the single-target XOR gate that could be presented? Or some multi-target, Gaussian toy model with enough synergy to be interesting? I think this would go a long way to making this work more accessible to the kind of interdisciplinary readership that this kind of article with inevitably attract.

      We appreciate this observation. We now clarify that:

      “redundancy between two units occurs when their future spontaneous evolution is predicted equally well by the past of either unit. Synergy instead occurs when considering the two units together increases the mutual information between the units’ past and their future – suggesting that the future of each is shaped by its interactions with the other. At the microscale (e.g., for spiking neurons) this phenomenon has been suggested as reflecting “information modification” 36,40,47. Synergy can also be viewed as reflecting the joint contribution of parts of the system to the whole, that is not driven by common input48.”

      In the Methods, we have also added the following example to provide additional intuition about synergy in the case of continuous rather than discrete variables:

      “As another example for the case of Gaussian variables (as employed here), consider a 2-node coupled autoregressive process with two parameters: a noise correlation c and a coupling parameter a. As c increases, the system is flooded by “common noise”, making the system increasingly redundant because the common noise “swamps” the signal of each node. As a increases, each node has a stronger influence both on the other and on the system as a whole, and we expect synergy to increase. Therefore, synergy reflects the joint contribution of parts of the system to the whole that is not driven by common noise. This has been demonstrated through computational modelling (Mediano et al 2019 Entropy).”

      See below for the relevant parts of Figures 1 and 2 from Mediano et al (2019 Entropy), where Psi refers to the total synergy in the system.

      Author response image 1.

      Strengths

      The authors have a very strong collection of datasets with which to explore their topic of interest. By comparing fMRI scans from patients with disorders of consciousness, healthy resting state, and various stages of propofol anesthesia, the authors have a very robust sample of the various ways consciousness can be perturbed, or lost. Consequently, it is difficult to imagine that the observed effects are merely a quirk of some biophysical effect of propofol specifically, or a particular consequence of long-term brain injury, but do in fact reflect some global property related to consciousness. The data and analyses themselves are well-described, have been previously validated, and are generally strong. I have no reason to doubt the technical validity of the presented results.

      The discussion and interpretation of these results is also very nice, bringing together ideas from the two leading neurocognitive theories of consciousness (Global Workspace and Integrated Information Theory) in a way that feels natural. The SAPHIRE model seems plausible and amenable to future research. The authors discuss this in the paper, but I think that future work on less radical interventions (e.g. movie watching, cognitive tasks, etc) could be very helpful in refining the SAPHIRE approach.

      Finally, the analogy between the PID terms and the information provided by each eye redundantly, uniquely, and synergistically is superb. I will definitely be referencing this intuition pump in future discussions of multivariate information sharing.

      We are very grateful for these positive comments, and for the feedback on our eye metaphor.

      Weaknesses

      I have some concerns about the way "information processing" is used in this study. The data analyzed, fMRI BOLD data is extremely coarse, both in spatial and temporal terms. I am not sure I am convinced that this is the natural scale at which to talk about information "processing" or "integration" in the brain. In contrast to measures like sample entropy or Lempel-Ziv complexity (which just describe the statistics of BOLD activity), synergy and Phi are presented here as quasi-causal measures: as if they "cause" or "represent" phenomenological consciousness. While the theoretical arguments linking integration to consciousness are compelling, is this is right data set to explore them in? For example, the work by Newman, Beggs, and Sherril (nee Faber), synergy is associated with "computation" performed in individual neurons: the information about the future state of a target neuron that is only accessible when knowing both inputs (analogous to the synergy in computing the sum of two dice). Whether one thinks that this is a good approach neural computation or not, it fits within the commonly accepted causal model of neural spiking activity: neurons receive inputs from multiple upstream neurons, integrate those inputs and change their firing behavior accordingly.

      In contrast, here, we are looking at BOLD data, which is a proxy measure for gross-scale regional neural activity, which itself is a coarse-graining of millions of individual neurons to a uni-dimensional spectrum that runs from "inactive to active." It feels as though a lot of inferences are being made from very coarse data.

      We appreciate the opportunity to clarify this point. It is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. In other words, while our results do show that BOLD-derived Phi-R tracks the loss and recovery of consciousness, we do not claim that they are the cause of it: only that an empirical relationship exists, which is in line with what we might expect on theoretical grounds. We have now clarified this in the Limitations section of our revised manuscript, as well as revising our language accordingly in the rest of the manuscript.

      We also clarify that the meaning of “information processing” that we adopt pertains to “intrinsic” information that is present in the system’s spontaneous dynamics, rather than extrinsic information about a task:

      “Information decomposition can be applied to neural data from different scales, from electrophysiology to functional MRI, with or without reference to behaviour 34. When behavioural data are taken into account, information decomposition can shed light on the processing of “extrinsic” information, understood as the translation of sensory signals into behavioural choices across neurons or regions 41,43,45,47. However, information decomposition can also be applied to investigate the “intrinsic” information that is present in the brain’s spontaneous dynamics in the absence of any tasks, in the same vein as resting-state “functional connectivity” and methods from statistical causal inference such as Granger causality 49. In this context, information processing should be understood in terms of the dynamics of information: where and how information is stored, transferred, and modified 34.”

      References:

      (1) Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 24, 930 (2022).

      Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.

      In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is something confusing in the redundancy and synergy matrices shown in Figure 2. These are pair-wise matrices, where the PID was applied to identify high-order interactions between pairs of brain regions. I understand that synergy and redundancy are assessed in the way the brain areas integrate information in time, but it is still a little contradictory to speak about high-order in pairs of areas. When talking about a "synergistic core", one expects that all or most of the areas belonging to that core are simultaneously involved in some (synergistic) information processing, and I do not see this being assessed with the currently presented methodology. Similarly, if redundancy is assessed only in pairs of areas, it may be due to simple correlations between them, so it is not a high-order interaction. Perhaps it is a matter of language, or about the expectations that the word 'synergy' evokes, so a clarification about this issue is needed. Moreover, as the rest of the work is based on these 'pair-wise' redundancy and synergy matrices, it becomes a significative issue.

      We are grateful for the opportunity to clarify this point. We should highlight that PhiID is in fact assessing four variables: the past of region X, the past of region B, the future of region X, and the future of region Y. Since X and Y each feature both in the past and in the future, we can re-conceptualise the PhiID outputs as reflecting the temporal evolution of how X and Y jointly convey information: the persistent redundancy that we consider corresponds to information that is always present in both X and Y; whereas the persistent synergy is information that X and Y always convey synergistically. In contrast, information transfer would correspond to the phenomenon whereby information was conveyed by one variable in the past, and by the other in the future (see Luppi et al., 2024 TICS; and Mediano et al., 2021 arXiv for more thorough discussions on this point). We have now added this clarification in our Introduction and Results, as well as adding the new Figure 2 to clarify the meaning of PhiID terms.

      We would also like to clarify that all the edges that we identify as significantly changing are indeed simultaneously involved in the difference between consciousness and unconsciousness. This is because the Network-Based Statistic differs from other ways of identifying edges that are significantly different between two groups or conditions, because it does not consider edges in isolation, but only as part of a single connected component.

      Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. The authors determined the synergistic global workspace based on previous work and extended the role of its elements using 100 fMRI recordings of the resting state of healthy participants of the HCP. The authors then applied network analysis and two different measures of information integration to examine changes in reduced states of consciousness (such as anesthesia and after-coma disorders of consciousness). They provided an interpretation of the results in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness and related to perturbative approaches. Overall, I found the manuscript to be well-organized, and the results are interesting and could be informative for a broad range of literature, suggesting interesting new ideas for the field to explore. However, there are some points that the authors could clarify to strengthen the paper. Key points include:

      (1) The work strongly relies on the identification of the regions belonging to the synergistic global workspace, which was primarily proposed and computed in a previous paper by the authors. It would be great if this computation could be included in a more explicit way in this manuscript to make it self-contained. Maybe include some table or figure being explicit in the Gradient of redundancy-to-synergy relative importance results and procedure.

      We have now added the new Supplementary Figure 1 to clarify how the synergistic workspace is identified, as per Luppi et al (2022 Nature Neuroscience).

      (2) It would be beneficial if the authors could provide further explanation regarding the differences in the procedure for selecting the workspace and its role within the proposed architecture. For instance, why does one case uses the strength of the nodes while the other case uses the participation coefficient? It would be interesting to explore what would happen if the workspace was defined directly using the participation coefficient instead of the strength. Additionally, what impact would it have on the procedure if a different selection of modules was used? For example, instead of using the RSN, other criteria, such as modularity algorithms, PCA, Hidden Markov Models, Variational Autoencoders, etc., could be considered. The main point of my question is that, probably, the RSN are quite redundant networks and other methods, as PCA generates independent networks. It would be helpful if the authors could offer some comments on their intuition regarding these points without necessarily requiring additional computations.

      We appreciate the opportunity to clarify this point. Our rationale for the procedure used to identify the workspace is to find regions where synergy is especially prominent. This is due to the close mathematical relationship between synergistic information and integration of information (see also Luppi et al., 2024 TICS), which we view as the core function of the global workspace. This identification is based on the strength ranking, as per Luppi et al (2022 Nature Neuroscience), which demonstrated that regions where synergy predominates (i.e., our proposed workspace) are also involved with high-level cognitive functions and anatomically coincide with transmodal association cortices at the confluence of multiple information streams. This is what we should expect of a global workspace, which is why we use the strength of synergistic interactions to identify it, rather than the participation coefficient. Subsequently, to discern broadcasters from gateways within the synergistic workspace, we seek to encapsulate the meaning of a “broadcaster” in information terms. We argue that this corresponds with making the same information available to multiple modules. Sameness of information corresponds to redundancy, and multiplicity of modules can be reflected in the network-theoretic notion of participation coefficient. Thus, a broadcaster is a region in the synergistic workspace (i.e., a region with strong synergistic interactions) that in addition has a high participation coefficient for its redundant interactions.

      Pertaining specifically to the use of resting-state networks as modules, indeed our own (Luppi et al., 2022 Nature Neuroscience) and others’ research has shown that each RSN entertains primarily redundant interactions among its constituent regions. This is not surprising, since RSNs are functionally defined: their constituent elements need to process the same information (e.g., pertaining to a visual task in case of the visual network). We used the RSNs as our definition of modules, because they are widely understood to reflect the intrinsic organisation of brain activity into functional units; for example, Smith et al., (2009 PNAS) and Cole et al (2014 Neuron) both showed that RSNs reflect task-related co-activation of regions, whether directly quantified from fMRI in individuals performing multiple tasks, or inferred from meta-analysis of the neuroimaging literature. This is the aspect of a “module” that matters from the global workspace perspective: modules are units with distinct function, and RSNs capture this well. This is therefore why we use the RSNs as modules when defining the participation coefficient: they provide an a-priori division into units with functionally distinct roles.

      Nonetheless, we also note that RSN organisation is robustly recovered using many different methods, including seed-based correlation from specific regions-of-interest, or Independent Components Analysis, or community detection on the network of inter-regional correlations - demonstrating that they are not merely a function of the specific method used to identify them. In fact, we show significant correlation between participation coefficient defined in terms of RSNs, and in terms of modules identified in a purely data-driven manner from Louvain consensus clustering (Figure S4).

      (3) The authors acknowledged the potential relevance of perturbative approaches in terms of PCI and quantification of consciousness. It would be valuable if the authors could also discuss perturbative approaches in relation to inducing transitions between brain states. In other words, since the authors investigate disorders of consciousness where interventions could provide insights into treatment, as suggested by computational and experimental works, it would be interesting to explore the relationship between the synergistic workspace and its modifications from this perspective as well.

      We thank the Reviewer for bringing this up: we now cite several studies that in recent years have applied perturbative approaches to induce transitions between states of consciousness.

      “The PCI is used as a means of assessing the brain’s current state, but stimulation protocols can also be adopted to directly induce transitions between states of consciousness. In rodents, carbachol administration to frontal cortex awakens rats from sevoflurane anaesthesia120, and optogenetic stimulation was used to identify a role of central thalamus neurons in controlling transitions between states of responsiveness121,122. Additionally, several studies in non-human primates have now shown that electrical stimulation of the central thalamus can reliably induce awakening from anaesthesia, accompanied by the reversal of electrophysiological and fMRI markers of anaesthesia 123–128. Finally, in human patients suffering from disorders of consciousness, stimulation of intra-laminar central thalamic nuclei was reported to induce behavioural improvement 129, and ultrasonic stimulation 130,131 and deep-brain stimulation are among potential therapies being considered for DOC patients 132,133. It will be of considerable interest to determine whether our corrected measure of integrated information and topography of the synergistic workspace also restored by these causal interventions.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would appreciate it if the authors could revisit the figures and make sure that:

      (1) All fonts are large enough to be readable for people with visual impairments (for ex. the ranges on the colorbars in Fig. 2 are unreadably small).

      Thank you: we have increased font sizes.

      (2) The colormaps are scaled to show meaningful differences (Fig. 2A)

      We have changed the color scale in Figure 2A and 2B.

      Also, the authors may want to revisit the references section: some of the papers that were pre-prints at one point have now been published and should be updated.

      Thank you: we have updated our references.

      Minor comments:

      • In Eqs. 2 and 3, the unique information term uses the bar notation ( | ) that is typically indicative of "conditioned on." Perhaps the authors could use a slash notation (e.g. Unq(X ; Z / Y)) to avoid this ambiguity? My understanding of the Unique information is that it is not necessarily "conditioned on", so much as it is "in the context of".

      Indeed, the “|” sign of “conditioning” could be misleading; however, the “/” sign could also be misleading, if interpreted as division. Therefore, we have opted for the “\” sign of “set difference”, in Eq 2 and 3, which is conceptually more appropriate in this context.

      • The font on the figures is a little bit small - for readers with poor eyes, it might be helpful to increase the wording size.

      We have increased font sizes in the figures where relevant.

      • I don't quite understand what is happening in Fig. 2A - perhaps it is a colormap issue, but it seems as though it's just a bit white square? It looks like redundancy is broadly correlated with FC (just based on the look of the adjacency matrices), but I have no real sense of what the synergistic matrix looks like, other than "flat."

      We have now changed the color scale in Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Besides the issues mentioned in the Public review, I have the following suggestions to improve the manuscript:

      • At the end of the introduction, a few lines could be added explaining why the study of DOC patients and subjects under anesthesia will be informative in the context of this work.

      By comparing functional brain scans from transient anaesthetic-induced unconsciousness and from the persistent unconsciousness of DOC patients, which arises from brain injury, we can search for common brain changes associated with loss of consciousness – thereby disambiguating what is specific to loss of consciousness.

      • On page and in general the first part of Results, it is not evident that you are working with functional connectivity. Many times the word 'connection' is used and sometimes I was wondering whether they were structural or functional. Please clarify. Also, the meaning of 'synergistic connection' or 'redundant connection' could be explained in lay terms.

      Thank you for bringing this up. We have now replaced the word “connection” with “interaction” to disambiguate this issue, further adding “functional” where appropriate. We have also provided, in the Introduction, an intuitive explanation of what synergy and redundancy mean int he context of spontaneous fMRI signals.

      • Figure 2 needs a lot of improvement. The matrix of synergistic interactions looks completely yellow-ish with some vague areas of white. So everything is above 2. What does it mean?? Pretty uninformative. The matrix of redundant connections looks a lot of black, with some red here and there. So everything is below 0.6. Also, what are the meaning and units of the colorbars?.

      We agree: we have increased font sizes, added labels, and changed the color scale in Figure 2. We hope that the new version of Figure 2 will be clearer.

      • Caption of Figure 2 mentions "... brain regions identified as belonging to the synergistic global workspace". I didn't get it clear how do you define these areas. Are they just the sum of gateways and broadcasters, or is there another criterion?

      Regions belonging to the synergistic workspace are indeed the set comprising gateways and broadcasters; they are the regions that are synergy-dominated, as defined in Luppi et al., 2022 Nature Neuroscience. We have now clarified this in the figure caption.

      • In the first lines of page 7, it is said that data from DOC and anesthesia was parcellated in 400 + 54 regions. However, it was said in a manner that made me think it was a different parcellation than the other data. Please make it clear that the parcellation is the same (if it is).

      We have now clarified that the 400 cortical regions are from the Schaefer atlas, and 54 subcortical regions from the Tian atlas, as for the other analysis. The only other parcellation that we use is the Schaefer-232, for the robustness analysis. This is also reported in the Methods.

      • Figure 3: the labels in the colorbars cannot be read, please make them bigger. Also, the colorbars and colorscales should be centered in white, to make it clear that red is positive and blue is negative. O at least maintain consistency across the panels (I can't tell because of the small numbers).

      Thank you: we have increased font sizes, added labels, indicated that white refers to zero (so that red is always an increase, and blue is always a decrease), and changed the color scale in Figure 2.

      • The legend of Figure 4 is written in a different style, interpreting the figure rather than describing it. Please describe the figure in the caption, in order to let the read know what they are looking at.

      We have endeavoured to rewrite the legend of Figure 4 in a style that is more consistent with the other figures.

      • In several parts the 'whole-minus-sum' phi measure is mentioned and it is said that it did not decrease during loss of consciousness. However, I did not see any figure about that nor any conspicuous reference to that in Results text. Where is it?

      We apologise for the confusion: this is Figure S3A, in the Supplementary. We have now clarified this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the same direction, regarding Fig. 2, in my opinion, it does not effectively aid in understanding the selection of regions as more synergistic or redundant. In panels A) and B), the color scales could be improved to better distinguish regions in the matrices (panel A) is saturated at the upper limit, while panel B) is saturated at the lower limit). Additionally, I suggest indicating in the panels what is being measured with the color scales.

      Thank you: we have increased font sizes, added labels, and changed the color scale in Figure 2.

      (2) When investigating the synergistic core of human consciousness and interpreting the results of changes in information integration measures in terms of the proposed framework, did the authors consider the synergistic workspace computed in HCP data? If the answer is positive, it would be helpful for the authors to be more explicit about it and elaborate on any differences that may be found, as well as the potential impact on interpretation.

      This is correct: the synergistic workspace, including gateways and broadcasters, are identified from the Human Connectome Project dataset. We now clarify this in the manuscript.

      Minors:

      (1) I would suggest improving the readability of figures 2 and 3, considering font size (letters and numbers) and color bars (numbers and indicate what is measured with this scale). In Figure 1, the caption defines steps instead stages that are indicated in the figure.

      Thank you: we have increased font sizes, added labels, and replaced steps with “stages” in Figure 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discover that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing, and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:

      • The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying

      • The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively

      • They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Thank you for your insightful and detailed assessment of our work. Your constructive feedback will help to improve our manuscript.

      Weaknesses:

      • A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated-Severing the labellum will not prevent optogenetic activation of md-L from triggering neural responses downstream of md-L. Optogenetic activation is strong enough to trigger action potentials in the remaining axons. Therefore, Qin et al. do not present convincing evidence that the defects they see in pumping can be specifically attributed to md-C.

      Thank you for your comments. This is important point that we did not adequately address in the original preprint. We have obtained imaging and behavioral results that strongly suggest md-C, rather than md-L, are essential for swallowing behavior.

      36 hours after the ablation of the labellum, the signals of md-L were hardly observable when GFP expression was driven by the intersection between Tmc-GAL4 & nompC-QF (see F Figure 3—figure supplement 1A). This observation indicates that the axons of md-L likely degenerated after 36 hours, and were unlikely to influence swallowing. Moreover, the projecting pattern of Tmc-GAL4 & nompC-QF>>GFP exhibited no significant changes in the brain post labellum ablation.

      Furthermore, even after labellum ablation for 36 hours, flies exhibited responses to light stimulation (see Figure 3—figure supplement 1B-C, Video 5) when ReaChR was expressed in md-C. We thus reasoned that md-C but not md-L, plays a crucial role in the swallowing process.

      • GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.

      In this study, we employed the nSyb-GRASP, wherein the GRASP is expressed at the presynaptic terminals by fusion with the synaptic marker nSyb. This method demonstrates an enhanced specificity compared to the original GRASP approach.

      Additionally, we utilized +/ UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; + / MN-LexA fruit flies as a negative control to mitigate potential false signals originating from the tool itself (Author response image 1, scale bar = 50μm). Beside the genotype Tmc-Gal4, Tub(FRT. Gal80) / UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies discussed in this manuscript, we also incorporated genotype Tmc-Gal4, Tub(FRT. Gal80) / lexAop-nSyb-spGFP1-10, UAS-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies as a reverse control (Author response image 2). Unexpectedly, similar positive signals were observed, indicating that, positive signals may emerge due to close proximity between neurons even with nSyb-GRASP.

      Author response image 1

      It should be noted that the existence of synaptic projections from motor neurons (MN) to md-C cannot be definitively confirmed at this juncture. At present, we can only posit the potential for synaptic connections between md-C and motor neurons. A more conclusive conclusion may be attainable with the utilization of comprehensive whole-brain connectome data in future studies.

      Author response image 2

      • As seen in Figure 2—figure supplement 1, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone

      It is true that the expression pattern of Tmc-GAL4 is broader than that of md-C alone. Our experiments, including those flies expressing TNT in Tmc+ neurons, demonstrated difficulties in emptying (Figure 2A, 2D). Notably, we encountered challenges in finding fly stocks bearing UAS>FRT-STOP-P2X2. Consequently, we opted to utilize Tmc-GAL4 to drive UAS-P2X2 instead. We believe that the results further support our hypothesis on the role of md-C in the observed behavioral change in emptying.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the role of cibarial mechanosensory neurons in fly ingestion. They demonstrate that pumping of the cibarium is subtly disrupted in mutants for piezo, TMC, and nomp-C. Evidence is presented that these three genes are co-expressed in a set of cibarial mechanosensory neurons named md-C. Silencing of md-C neurons results in disrupted cibarial emptying, while activation promotes faster pumping and/or difficulty filling. GRASP and chemogenetic activation of the md-C neurons is used to argue that they may be directly connected to motor neurons that control cibarial emptying.

      The manuscript makes several convincing and useful contributions. First, identifying the md-C neurons and demonstrating their essential role for cibarium emptying provides reagents for further studying this circuit and also demonstrates the important of mechanosensation in driving pumping rhythms in the pharynx. Second, the suggestion that these mechanosensory neurons are directly connected to motor neurons controlling pumping stands in contrast to other sensory circuits identified in fly feeding and is an interesting idea that can be more rigorously tested in the future.

      At the same time, there are several shortcomings that limit the scope of the paper and the confidence in some claims. These include:

      a) the MN-LexA lines used for GRASP experiments are not characterized in any other way to demonstrate specificity. These were generated for this study using Phack methods, and their expression should be shown to be specific for MN11 and MN12 in order to interpret the GRASP experiments.

      Thanks for the suggestion. We have checked the expression pattern of MN-LexA, which is similar to MN-GAL4 used in previous work (Manzo et al., PNAS., 2012, PMID:22474379) . Here is the expression pattern:

      Author response image 3

      b) There is also insufficient detail for the P2X2 experiment to evaluate its results. Is this an in vivo or ex vivo prep? Is ATP added to the brain, or ingested? If it is ingested, how is ATP coming into contact with md-C neuron if it is not a chemosensory neuron and therefore not exposed to the contents of the cibarium?

      The P2X2 experimental preparation was done ex vivo. We immersed the fly in the imaging buffer, as described in the Methods section under Functional Imaging. Following dissection and identification of the subesophageal zone (SEZ) area under fluorescent microscopy, we introduced ATP slowly into the buffer, positioned at a distance from the brain

      c) In Figure 3C, the authors claim that ablating the labellum will remove the optogenetic stimulation of the md-L neuron (mechanosensory neuron of the labellum), but this manipulation would presumably leave an intact md-L axon that would still be capable of being optogenetically activated by Chrimson.

      Please refer to the corresponding answers for reviewer 1 and Figure 3—figure supplement 1.

      d) Average GCaMP traces are not shown for md-C during ingestion, and therefore it is impossible to gauge the dynamics of md-C neuron activation during swallowing. Seeing activation with a similar frequency to pumping would support the suggested role for these neurons, although GCaMP6s may be too slow for these purposes.

      Profiling the dynamics of md-C neuron activation during swallowing is crucial for unraveling the operational model of md-C and validating our proposed hypothesis. Unfortunately, our assay faces challenges in detecting probable 6Hz fluorescent changes with GCaMP6s.

      In general, we observed an increase of fluorescent signals during swallowing, but movement of alive flies during swallowing influenced the imaging recording, so we could not depict a decent tracing for calcium imaging for md-C neurons. To enhance the robustness of our findings, patching the md-C neurons would be a more convincing approach. As illustrated in Figure 2, the somata of md-C neurons are situated in the cibarium rather than the brain. patching of the md-C neuron somata in flies during ingestion is difficult.

      e) The negative result in Figure 4K that is meant to rule out taste stimulation of md-C is not useful without a positive control for pharyngeal taste neuron activation in this same preparation.

      We followed methods used in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which we believe could confirm that md-C do not respond to sugars.

      In addition to the experimental limitations described above, the manuscript could be organized in a way that is easier to read (for example, not jumping back and forth in figure order).

      Thanks for your suggestion and the manuscript has been reorganized.

      Reviewer #3 (Public Review):

      Swallowing is an essential daily activity for survival, and pharyngo-laryngeal sensory function is critical for safe swallowing. In Drosophila, it has been reported that the mechanical property of food (e.g. Viscosity) can modulate swallowing. However, how mechanical expansion of the pharynx or fluid content sense and control swallowing was elusive. Qin et al. showed that a group of pharyngeal mechanosensory neurons, as well as mechanosensory channels (nompC, Tmc, and Piezo), respond to these mechanical forces for regulation of swallowing in Drosophila melanogaster.

      Strengths:

      There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:

      While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

      We have confirmed that swallowing sucrose water solution activated md-C neurons, while sucrose water solution alone could not (Figure 4J-K). We hypothesized that the viscosity of the food might influence this expansion process.

      While we were unable to delineate the activation dynamics of md-C neurons, our proposal posits that these neurons could be activated in a single pump cycle, sequentially stimulating MN12 and MN11. Another possibility is that the activation of md-C neurons acts as a switch, altering the oscillation pattern of the swallowing central pattern generator (CPG) from a resting state to a working state.

      In the experiments with w1118 flies fed with MC (methylcellulose) water, we observed that viscosity predominantly affects the filling process rather than the emptying process, consistent with previous findings. This raises an intriguing question. Our investigation into the mutation of mechanosensitive ion channels revealed a significant impact on the emptying process. We believe this is due to the loss of mechanosensation affecting the vibration of swallowing circuits, thereby influencing both the emptying and filling processes. In contrast, viscosity appears to make it more challenging for the fly to fill the cibarium with food, primarily attributable to the inherent properties of the food itself.

      Reviewer #4 (Public Review):

      A combination of optogenetic behavioral experiments and functional imaging are employed to identify the role of mechanosensory neurons in food swallowing in adult Drosophila. While some of the findings are intriguing and the overall goal of mapping a sensory to motor circuit for this rhythmic movement are admirable, the data presented could be improved.

      The circuit proposed (and supported by GRASP contact data) shows these multi-dendritic neurons connecting to pharyngeal motor neurons. This is pretty direct - there is no evidence that they affect the hypothetical central pattern generator - just the execution of its rhythm. The optogenetic activation and inhibition experiments are constitutive, not patterned light, and they seem to disrupt the timing of pumping, not impose a new one. A slight slowing of the rhythm is not consistent with the proposed function.

      Motor neurons implicated in patterned motions can be considered effectors of Central Pattern Generators (CPGs)(Marder et al., Curr Biol., 2001, PMID: 11728329; Hurkey et al., Nature., 2023, PMID:37225999). Given our observation of the connection between md-C neurons and motor neurons, it is reasonable to speculate that md-C neurons influence CPGs. Compared to the patterned light (0.1s light on and 0.1s light off) used in our optogenetic experiments, we noted no significant changes in their responses to continuous light stimulation. We think that optogenetic methods may lead to overstimulation of md-C neurons, failing to accurately mimic the expansion of the cibarium during feeding.

      Dysfunction in mechanosensitive ion channels or mechanosensory neurons not only disrupts the timing of pumping but also results in decreased intake efficiency (Figure 1E). The water-swallowing rhythm is generally stable in flies, and swallowing is a vital process that may involve redundant ion channels to ensure its stability.

      The mechanosensory channel mutants nompC, piezo, and TMC have a range of defects. The role of these channels in swallowing may not be sufficiently specific to support the interpretation presented. Their other defects are not described here and their overall locomotor function is not measured. If the flies have trouble consuming sufficient food throughout their development, how healthy are they at the time of assay? The level of starvation or water deprivation can affect different properties of feeding - meal size and frequency. There is no description of how starvation state was standardized or measured in these experiments.

      Defects in mechanosensory channel mutants nompC, piezo, and TMC, have been extensively investigated (Hehlert et al., Trends Neurosci., 2021, PMID:332570000). Mutations in these channels exhibit multifaceted effects, as illustrated in our RNAi experiments (see Figure 2E). Deprivation of water and food was performed in empty fly vials. It's important to note that the duration of starvation determines the fly's willingness to feed but not the pump frequency (Manzo et al., PNAS., 2012, PMID:22474379).

      In most cases, female flies were deprived water and food in empty vials for 24 hours because after that most flies would be willing to drink water. The deprivation time is 12 hours for flies with nompC and Tmc mutated or flies with Kir2.1 expressed in md-C neurons, as some of these flies cannot survive 24h deprivation.

      The brain is likely to move considerably during swallow, so the GCaMP signal change may be a motion artifact. Sometimes this can be calculated by comparing GCaMP signal to that of a co-expressed fluorescent protein, but there is no mention that this is done here. Therefore, the GCaMP data cannot be interpreted.

      We did not co-express a fluorescent protein with GCaMP for md-C. The head of the fly was mounted onto a glass slide, and we did not observe significant signal changes before feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      .>Abstract: I disagree that swallow is the first step of ingestion. The first paragraph also mentions the final checkpoint before food ingestion. Perhaps sufficient to say that swallow is a critical step of ingestion.

      Indeed, it is not rigorous enough to say “first step”. This has been replaced by “early step”.

      Introduction:

      Line 59: "Silence" should be "Silencing"

      This has been replaced.

      Results:

      Lines 91-92: I am not clear about what this means. 20% of nompC and 20% of wild-type flies exhibit incomplete filling? So nompC is not different from wild-type?

      Sorry for the mistake. Viscous foods led to incomplete emptying (not incomplete filling), as displayed in Video 4. The swallowing behavior differs between nompC mutants and wild-type flies, as illustrated in Figure 1C, Figure 1—figure supplement 1A-C and video 1&5.

      When fed with 1% MC water solution (Figure 1—figure supplement 1E-H). We found that when fed with 1% MC watere solution, Tmc or piezo mutants displayed incomplete emptying, which could constitute a long time proportion of swallowing behavior; while only 20% of nompC flies and 20% of wild-type flies sporadically exhibit incomplete emptying, which is significantly different. Though the percent of flies displaying incomplete pump is similar between nompC mutant and wild-type files, you can find it quite different in video 1 and 5.

      Line 94: Should read: “while for foods with certain viscosity, the pump of Tmc or piezo mutants might"

      What evidence is there for weakened muscle motion? The phenotypes of all three mutants is quite similar, so concluding that they have roles in initiation versus swallowing strength is not well supported -this would be better moved to the discussion since it is speculative.

      Muscles are responsible for pumping the bolus from the mouth to the crop. In the case of Tmc or piezo mutants, as evidenced by incomplete filling for viscous foods (see Video 4), we speculate that the loss of sensory stimuli leads to inadequate muscle contraction. The phenotypes observed in Tmc and piezo mutants are similar yet distinct from those of the wild-type or nompC mutant, as shown in Video 1 and 4. The phrase "due to weakened muscle motion" has been removed for clarity.

      Line 146: If md-L neurons are also labeled by this intersection, then you are not able to know whether the axons seen in the brain are from md-L or md-C neurons. Line 148: cutting the labellum is not sufficient to ablate md-L neurons. The projections will still enter the brain and can be activated with optogenetics, even after severing the processes that reside in the labellum.

      Please refer to the responses for reviewer #1 (Public Review):” A major weakness of the paper…” and Figure 4.

      Line 162: If the fly head alone is in saline, do you know that the sucrose enters the esophagus? The more relevant question here is whether the md-C neurons respond to mechanical force. If you could artificially inflate the cibarium with air and see the md-C neurons respond that would be a more convincing result. So far you only know that these are activated during ingestion, but have not shown that they are activated specifically by filling or emptying. In addition, you are not only imaging md-C (md-L is also labeled). This caveat should be mentioned.

      We followed the methods outlined in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which suggested that md-C neurons do not respond to sugars. While we aimed to mechanically stimulate md-C neurons, detecting signal changes during different steps of swallowing is challenging. This aspect could be further investigated in subsequent research with the application of adequate patch recording or two-photon microscopy (TPM).

      Figure 3: It is not clear what the pie charts in Figure 3 A refer to. What are the three different rows, and what does blue versus red indicate?

      Figure 3A illustrates three distinct states driven by CsChrimson light stimulation of md-C neurons, with the proportions of flies exhibiting each state. During light activation, flies may display difficulty in filling, incomplete filling, or a normal range of pumping. The blue and red bars represent the proportions of flies showing the corresponding state, as indicated by the black line.

      Figure 4: Where are the example traces for J? The comparison in K should be average dF/F before ingestion compared with average dF/F during ingestion. Comparing the in vitro response to sucrose to the in vivo response during ingestion is not a useful comparison.

      Please refer to the answers for reviewer #2 question d).

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments that would address some of my concerns listed in the public review include:

      a) high resolution SEZ images of MN-LexA lines crossed to LexAop-GFP to demonstrate their specificity

      b) more detail on the P2X2 experiment. It is hard to make suggestions beyond that without first seeing the details.

      c) presenting average GCaMP traces for all calcium imaging results

      d) to rule out taste stimulation of md-C (Figure 4K) I would suggest performing more extensive calcium imaging experiments with different stimuli. For example, sugar, water, and increasing concentrations of a neutral osmolyte (e.g. PEG) to suppress the water response. I think that this is more feasible than trying to get an in vitro taste prep to be convincing.

      Please refer to the responses for public review of reviewer #2.

      Reviewer #3 (Recommendations For The Authors):

      Below I list my suggestions as well as criticisms.

      (1) It would be excellent if the authors could demonstrate whether varying levels of food viscosity affect md-C activation.

      That is a good point, and could be studied in future work.

      (2) It is not clear whether an intersectional approach using TMC-GAL4 and nompC-QF abolishes labelling of the labellar multidendritic neurons. If this is the case, please show labellar multidendritic neurons in TMC-GAL4 only flies and flies using the intersectional approach. Along with this question, I am concerned that labellum-removed flies could be used for feeding assay.

      Intersectional labelling using TMC-GAL4 and nompC-QF could not abolish labelling of the labellar multidendritic neurons (Author response image 4). Labellum-removed flies could be used for feeding assay (Figure 3—figure supplement 1B-C, video 5), but once LSO or cibarium of fly was damaged, swallowing behavior would be affected. Removing labellum should be very careful.

      Author response image 4

      (3) Please provide the detailed methods for GRASP and include proper control.

      Please refer to the responses for public review of reviewer #1.

      (4) The authors hypothesized that md-C sequentially activates MN11 and 12. Is the time gap between applying ATP on md-C and activation of MN11 or MN12 different? Please refer to the responses for public review of reviewer #3. The time gap between applying ATP on md-C and activation of MN11 or MN12 didn’t show significant differences, and we think the reason is that the ex vivo conditions could not completely mimic in vivo process.

      I found the manuscript includes many errors, which need to be corrected.

      (1) The reference formatting needs to be rechecked, for example, lines 37, 42, and 43.

      (2) Line 44-46: There is some misunderstanding. The role of pharyngeal mechanosensory neurons is not known compared with chemosensory neurons.

      (3) Line 49: Please specify which type of quality of food. Chemical or physical?

      (4) Line 80 and Figure 1B-D Authors need to put filling and emptying time data in the main figure rather than in the supplementary figure. Otherwise, please cite the relevant figures in the text(S1A-C).

      (5) Line 84-85; Is "the mutant animals" indicating only nompC? Please specify it.

      (6) Figure 1a: It is hard to determine the difference between the series of images. And also label filling and emptying under the time.

      (7) S1E-H: It is unclear what "Time proportion of incomplete pump" means. Please define it.

      (8) Please reorganize the figures to follow the order of the text, for example, figures 2 and 4

      (9) Figure 4A. There is mislabelling in Figure 4A. It is supposed to be phalloidin not nc82.

      (10) Figure 4K: It does not match the figure legend and main text.

      (11) Figure 4D and G: Please indicate ATP application time point.

      Thanks for your correction and all the points mentioned were revised.

      Reviewer #4 (Recommendations For The Authors):

      The figures need improvement. 1A has tiny circles showing pharynx and any differences are unclear.

      The expression pattern of some of these drivers (Supplement) seems quite broad. The tmc nompC intersection image in Figure 1F is nice but the cibarium images are hard to interpret: does this one show muscle expression? What are "brain" motor neurons? Where are the labellar multi-dendritic neurons?

      Tmc nompC intersection image show no expression in muscles. Somata of motor neurons 12 or 11 situated at SEZ area of brain, while somata of md-C neurons are in the cibarium. Image of md-L neurons was posted in response for reviewer #3 (Recommendations For The Authors):

      Why do the assays alternate between swallowing food and swallowing water?

      Thank for your suggestion, figure 1A has been zoomed-in. The Tmc nompC intersection image in Figure 2F displayed the position of md-C neurons in a ventral perspective, and muscles were not labelled. We stained muscles in cibarium by phalloidin and the image is illustrated in Figure 4A, while we didn’t find overlap between md-C neurons and muscles. Image of md-L neurons were posted as Author response image 4.

      In the majority of our experiments, we employed water to test swallowing behavior, while we used methylcellulose water solution to test swallowing behavior of mechanoreceptor mutants, and sucrose solution for flies with md-C neurons expressing GCaMP since they hardly drank water when their head capsules were open.

      How starved or water-deprived were the flies?

      One day prior to the behavioral assays, flies were transferred to empty vials (without water or food) for 24 hours for water deprivation. Flies who could not survive 24h deprivation would be deprived for 12h.

      How exactly was the pumping frequency (shown in Fig 1B) measured? There is no description in the methods at all. If the pump frequency is scored by changes in blue food intensity (arbitrary units?), this seems very subjective and maybe image angle dependent. What was camera frame rate? Can it capture this pumping speed adequately? Given the wealth of more quantitative methods for measuring food intake (eg. CAFE, flyPAD), it seems that better data could be obtained.

      How was the total volume of the cibarium measured? What do the pie charts in Figure 3A represent?

      The pump frequency was computed as the number of pumps divided by the time scale, following the methodology outlined in Manzo et al., 2012. Swallowing curves were plotted using the inverse of the blue food intensity in the cibarium. In this representation, ascending lines signify filling, while descending lines indicate emptying (see Figure 2D, 3B). We maintain objectivity in our approach since, during the recording of swallowing behavior, the fly was fixed, and we exclusively used data for analysis when the Region of Interest (ROI) was in the cibarium. This ensures that the intensity values accurately reflect the filling and emptying processes. Furthermore, we conducted manual frame-by-frame checks of pump frequency, and the results align with those generated by the time series analyzer V3 of ImageJ.

      For the assessment of total volume of ingestion, we referred the methods of CAFE, utilizing a measurable glass capillary. We then calculated the ingestion rate (nL/s) by dividing the total volume of ingestion by the feeding time.

      The changes seem small, in spite of the claim of statistical significance.

      The observed stability in pump frequency within a given genotype underscores the significance of even seemingly small changes, which is statistically significant. We speculate that the stability in swallowing frequency suggests the existence of a redundant mechanism to ensure the robustness of the process. Disruption of one channel might potentially be partially compensated for by others, highlighting the vital nature of the swallowing mechanism.

      How is this change in pump frequency consistent with defects in one aspect of the cycle - either ingestion (activation) or expulsion (inhibition)?

      Please refer to Figure 2, 3. Both filling and emptying process were affects, while inhibition mainly influences emptying time (Figure 1—figure supplement 1).

      for the authors:

      Line 48: extensively

      Line 62 - undiscovered.

      Line 107, 463: multi

      Line 124: What is "dysphagia?" This is an unusual word and should be defined.

      Line 446: severe

      Line 466: in the cibarium or not?

      Thanks for your correction and all the places mentioned were revised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reply to comments:

      (1) It was not clear why the phylogenetic analysis included non-validated GPCRs that clustered with the validated peptidergic receptors. Would restricting the phylogenetic analyses only to confirmed peptidergic GPCRs alter the topology of the tree and subsequent conclusions of independent expansion?

      Thank you for this comment. In general, phylogenetic analyses become more robust if a larger diversity and fuller complement of sequences are included. With very sparse sampling, sequences that are homologous but not orthologous may be misleadingly grouped together, because intermediate sequences have been left out. For tree building, we thus did not want to focus only on experimentally validated receptors but also on all receptors that are phylogenetically related to the validated receptors. Only this approach can ensure a comprehensive exploration of the relationship of peptidergic receptors. The broader phylogenetic approach was also essential to identify orthologs to the experimentally validated Nematostella receptors across other cnidarian species.

      (2) Clearly, other neuropeptide signaling systems in cnidarians remain to be discovered but this paper represents a huge step forward.

      We appreciate this assessment of the paper. We agree that many systems remain to be discovered. Our paper will also help with the identification of further receptors both in Nematostella as well as other cnidarian species. Please note that we have made specific receptor-ligand predictions for several cnidarian species based on our phylogenetic analysis. Our phylogenies could also help prioritize the study of the remaining orphan Nematostella GPCRs.

      (3) There are limitations in what can be interpreted from single cell transcriptomic data but the data nevertheless provide the foundations for future studies involving i). detailed anatomical analysis of neuropeptide and neuropeptide receptor expression in N. vectensis using mRNA in situ hybridization and/or immunohistochemical methods and ii). functional analysis of the physiological/behavioral roles of neuropeptide signaling systems in N. vectensis

      We fully agree with this comment. The analysis of the available single-cell sequence resources clearly represents only the first step of anatomical and functional analyses. Our aim was to place the identified peptide-receptor interactions into a whole-organism context with cell type resolution, to highlight the potential complexity of peptidergic signaling in this organism and to facilitate the exploration and conceptualisation of our biochemical screen.

      Comments to authors

      (1) In future, when preparing manuscripts, please use page and line numbers; it makes the task below for reviewers much easier!

      We appreciate the suggestion and will do this for future manuscripts.

      (2) In the abstract the term "extensively wired" is used. In the context of neuropeptide mediated volume transmission this may not be an appropriate term to use because use of the word "wired" is likely to be associated with point-to-point type classical synaptic transmission; "extensively connected" would be better.

      Thank you for this comment. We have changed the text in the abstract to “extensively connected”.

      (3) Introduction: Please change "seven-transmembrane proteins and show a slower evolutionary rate than proneuropeptide..." to "seven-transmembrane proteins that show a slower evolutionary rate than proneuropeptide..."

      Changed.

      (4) Under the section "Creation of a Nematostella neuropeptide library, what is meant by "our regular expressions"? This needs to be rephrased to make it clearer what is meant.

      We have now rephrased the relevant sentence to make our approach clearer.

      “This predicted secretome was filtered with regular expressions to detect sequences with the repetitive dibasic cleavage sites (K and R in any combination) and amidation sites, using a custom script from a previous publication (Thiel et al., 2021).”

      and later:

      “Based on the MS data, we included the additional, non-dibasic N-terminal cleavage sites into our script that uses regular expressions to search for repetitive cleavage sites (Thiel et al., 2024) and re-screened the predicted secretome.”

      (5) Under the section "Creation of a Nematostella neuropeptide library" the phrase "differ in the length of their N-terminus" needs to be changed to "differ in the length of their N-terminal region". The N-terminus is, as its name implies, one end of the peptide/protein so it can't have a length as such.

      Changed.

      (6) Under the section "Analysis of metazoan class A GPCRs and selection of N. vectensis neuropeptide-receptor candidates",

      Change:

      "For a more detailed analysis, we then reduced our sampled species to the cnidarian, the bilaterian with experimentally confirmed GPCRs and Petromyzon marinus, and the two placozoan species (Figure 2B)."

      To

      "For a more detailed analysis, we then reduced our sampled species to cnidarians, bilaterians with experimentally confirmed GPCRs and Petromyzon marinus, and two placozoan species (Figure 2B)."

      Changed.

      (7) Under the section "Analysis of metazoan class A GPCRs and selection of N. vectensis neuropeptide-receptor candidates" - change "We re-run" to "We re-ran"

      Changed.

      (8) Throughout the paper reference is made to a variety of neuropeptides that have or are predicted to have an N-terminal pyroglutmate. However, these are referred to without indicating this post-translational modification e.g. QGRFamide.

      This should be corrected throughout the paper, in the text, and figures. Two abbreviations for pyroglutamate are used in the literature:

      pQ, which shows that the encoded amino acid is Q (Glutamine)

      pE, which shows that the post-translationally modified amino-acid is glutamate (E)

      In the neuropeptide field, pQ seems to be more widely used than pE, so our recommendation would be to use pQ.

      In the revised version we now write pyroQ whenever we refer to the actual peptide. We now only use the peptide name without indicating this modification when we refer to the precursor of these peptides.

      (9) The title for Figure 5 is rather short and vague. A title like "Tissue-specific expression of neuropeptide precursors and receptors in Nematostella" seems more appropriate

      We appreciate the reviewer's input, and we have made the change accordingly. The revised figure legend now reads: “Tissue-specific expression of neuropeptide precursors and receptors (GPCRs) in N. vectensis.”

      (10) All of the figures in the paper have been saved in bitmap format (e.g. tiff), which means that the resolution of the figures may end up being poor in the published article. All of the figures in this paper should be saved in vector format (e.g. eps) so that there is no loss of resolution when the size of the file/figure is reduced.

      We have now uploaded all figures in vector format (.eps or .pdf) to prevent any loss of resolution.

      (11) In Figure 3 - supplement 2 - the neuropeptides are referred to here as PRGamides and GPRGamides. Some consistency is needed here. And in Figure B, the G of one of the GPRGamides is not shown in black.

      Thank you for spotting this mistake. We now give the correct peptide sequence in parenthesis as "GPRGamide". We also highlighted the missing GPRGamide in the figure.

    1. Author Response

      We appreciate the thoughtful comments provided by the editor and reviewers. We were pleased to hear that they appreciated our work's contribution to the field of motor learning as well as our use of state-of-the-art analysis techniques.

      We are currently preparing a comprehensive revision of our manuscript to address several of the recommendations of the reviewers. It is our belief that this revision will not only strengthen our paper but also help clarify several areas that were highlighted by the reviewers.

      To address the concerns regarding potential confounds in our experimental design, we will be providing a more detailed justification and rationale for the experimental design and analysis choices made during our study. It appears that some reviewers’ comments may stem from misunderstandings concerning certain details of our task and we will carefully revise these sections to ensure that the design and purpose of the study are unambiguous. We will also be improving our characterizations of subjects’ learning behavior, which we believe will clarify some of the reviewers comments and enhance the overall rigor of our analyses. Lastly, we will be dealing with all concerns related to the statistical quantification of our results.

      We appreciate the opportunity to improve our manuscript for eLife and are eager to provide a revision that satisfies the majority of the reviewers’ recommendations

    2. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Chen et al. entitled, "The retina uncouples glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle", the authors look to provide insight on retinal metabolism and substrate utilization by using a murine explant model with various pharmacological treatments in conjunction with metabolomics. The authors conclude that photoreceptors, a specific cell within the explant, which also includes retinal pigment epithelium (RPE) and many other types of cells, are able to uncouple glycolytic and Krebs-cycle metabolism via three different pathways: 1) the mini-Krebs-cycle, fueled by glutamine and branched-chain amino acids; 2) the alanine-generating Cahill-cycle; and 3) the lactate-releasing Cori-cycle. While intriguing if determined to be true, these cell-specific conclusions are called into question due to the ex vivo experimental setup with the inclusion of RPE, the fact that the treatments were not cell-specific nor targeted at an enzyme specific to a certain cell within the retina, and no stable isotope tracing nor mitochondrial function assays were performed. Hence, without significant cell-specific methods and future experimentation, the primary claims are not supported.

      Strengths:

      This study attempts to improve on the issues that have limited the results obtained from previous ex vivo retinal explant studies by culturing in the presence of the RPE, which is a major player in the outer retinal metabolic microenvironment. Additionally, the study utilizes multiple pharmacologic methods to define retinal metabolism and substrate utilization.

      Weaknesses:

      A major weakness of this study is the lack of in vivo supporting data. Explant cultures remove the retina from its dual blood supply. Typically, retinal explant cultures are done without RPE. However, the authors included RPE in the majority of experimental conditions herein. However, it is unclear if the metabolomics samples included the RPE or not. The inclusion of the RPE, which is metabolically active and can be altered by the treatments investigated herein, further confounds the claims made regarding the neuroretina. Considering the pharmacologic treatments utilized with the explant cultures are not cell-specific and/or have significant off-target effects, it is difficult to ascertain that the metabolic changes are secondary to the effects on photoreceptors alone, which the authors claim. Additionally, the explants are taken at a very early age when photoreceptors are known to still be maturing. No mention or data is presented on how these metabolic changes are altered in retinal explants after photoreceptors have fully matured. Likewise, significant assumptions are made based on a single metabolomics experiment with no stable isotope tracing to support the pathways suggested. While the authors use immunofluorescence to support their claims at multiple points, demonstrating the presence of certain enzymes in the photoreceptors, many of these enzymes are present throughout the retina and likely the RPE. Finally, the claims presented here are in direction contradiction to recent in vivo studies that used cell-specific methods when examining retinal metabolism. No discussion of this difference in results is attempted. Response: We agree with the reviewer that in vivo studies could be very interesting indeed. However, technologically it will be extremely difficult to (repeatedly/continuously) sample the retina of an experimental animal and to combine this with an interventional study, with a subsequent metabolomic analysis. We do not currently have access to such technology nor are we aware of any other lab in the world capable of doing such studies. Moreover, virtually all prior studies on retinal metabolism have been done on explanted retina without RPE. This includes the seminal studies by Otto Warburg in the 1920s. As opposed to this, our retinal samples for also all the metabolomic analyses included the RPE, except for the no RPE condition that was used as a comparator for the earlier investigations.

      We note that our metabolomic analysis was done for all five experimental conditions where each condition included at least five independent samples (each derived from different animals).

      The reviewer is correct to say that our organotypic explant cultures are early post-natal, with explantation performed at post-natal day 9 and culturing until day 15. Since our retinal explant system has been validated extremely well over more than three decades of pertinent research (see for instance: Caffe et al., Curr Eye Res. 8:1083-92, 1989), we are confident that photoreceptors mature in vitro in ways that are very similar to the in vivo situation. As far as studies in adult retina (i.e. three months or older) are concerned, this is indeed an important question that will be addressed in future studies. Studies employing stable isotope labelling may also be very informative and are planned for the future, also in order to properly determine fluxes. This will likely require an extension to our NMR hardware with an 15N channel probe, something that we plan on implementing in the future.

      We are aware that a number of questions relating to retinal metabolism are controversial and that the use of other methodology or experimental systems may lead to alternative interpretations. We have now included citations of other studies that use, for example, conditional and/or inducible knock-outs or in vivo blood sampling (e.g. Wang et al., IOVS 38:48-55, 1997; Yu et al., Invest Ophthalmol Vis Sci. 46:4728-33, 2005; Swarup et al., Am J Physiol Cell Physiol. 316:C121-C133, 2019; Daniele et al., FASEB Journal 36:e22428, 2022) and discuss the pros and cons of such approaches (e.g. in Lines 376-384; 454-472).

      Reviewer #2 (Public Review):

      Summary:

      The authors aim to learn about retinal cell-specific metabolic pathways, which could substantially improve the way retinal diseases are understood and treated. They culture ex vivo mouse retinas for 6 days with 2 - 4 days of various drug treatments targeting different metabolic pathways or by removing the RPE/choroid tissue from the neural retina. They then look at photoreceptor survival, stain for various metabolic enzymes, and quantify a broad panel of metabolites. While this is an important question to address, the results are not sufficient to support the conclusions.

      Strengths:

      The questions the authors are exploring at extremely valuable and I commend the authors and working to learn more about retina metabolism. The different sensitivity of the cones to various drugs is interesting and may suggest key differences between rods and cones. The authors also provide a thoughtful discussion of various metabolic pathways in the context of previous publications.

      Weaknesses:

      As the authors point out, ex vivo culture models allow for control over multiple aspects of the environment (such as drug delivery) not available in vivo. Ex vivo cultures can provide good hints as to what pathways are available between interacting tissues. However, there are many limitations to ex vivo cultures, including shifting to a very artificial culture media condition that is extremely different than the native environment of the retina. It is well appreciated that cells have flexible metabolism and will adapt to the conditions provided. Therefore, observations of metabolic responses obtained under culture conditions need to be interpreted with caution, they indicate what the tissue is doing under those specific conditions (which include cells adapting and dying).

      Chen et al use pharmacological interventions to the impact of various metabolic pathways on photoreceptor survival and "long term" metabolic changes. The dose and timing of these drug treatments are not examined though. It is also hard to know how these drugs penetrate the tissue and it needs to be validated that the intended targets are being accurately hit. These relatively long-term treatments should be causing numerous downstream changes to metabolism, cell function, and survival, which makes looking at a snapshot of metabolite levels hard to interpret. It would be more valuable to look at multiple time points after drug treatment, especially easy time points (closer to 1 hr). The authors use metabolite ratios to make conclusions about pathway activity. It would be more valuable to directly measure pathway activity by looking a metabolite production rates in the media and/or with metabolic tracers again in time scales closer to minutes and hours instead of days.

      It is not clear from the text if the ex vivo samples with RPE/choroid intact are analyzed for metabolomics with the RPE/choroid still intact or if this is removed. If it is not removed, the comparison to the retina without RPE/choroid needs to be re-interpreted for the contribution of metabolites from the added tissue. The composition of the tissue is different and cannot be disentangled from the changes to the neural retina specifically.

      While the data is interesting and may give insights into some rod and cone-specific metabolic susceptibility, more work is needed to validate these conclusions. Given the limitations of the model the authors have over-interpreted their findings and the conclusions are not supported by the results. They need to either dramatically limit the scope of their conclusions or validate these hypotheses with additional models and tools.

      Response: We thank the reviewer for the insightful comments and agree that some of our interpretations may have been phrased too determinedly. We have therefore rephrased and toned down our conclusions in many instances in the text, and changed the manuscript title to now read “Retinal metabolism: Evidence for uncoupling of glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle”.

      Nevertheless, when considering the major known metabolic pathways and their possible impact on metabolite patterns after the experimental manipulations used here, we believe our interpretations to be consistent with the data obtained. Conversely, the previously suggested retinal aerobic glycolysis cannot explain most of the data we have obtained. Even further, also a predominant use of the classical “full” Krebs-cycle/OXPHOS would not explain the metabolite patterns found (e.g. alanine, N-acetylaspartate (NAA)). While this does not in itself mean that our interpretations are all correct, they seem plausible in view of the data at hand and will hopefully stimulate further research on retinal energy metabolism using complementary technologies that were not available to us for the purpose of this study.

      We comment that our organotypic retinal explant cultures, while they do contain their very own, native RPE, do not comprise the choroidal vasculature (in our explantation procedure the RPE readily detaches from the choroid).

      As far as the drugs used on retinal explants are concerned, we note that:

      (1) all three compounds used are extremely well validated, with literally thousands of studies and decades of research to their credit (i.e., 1,9-dideoxyforskolin: >270 publications since 1984; Shikonin: >1000 publications since 1977; FCCP: >2800 publications since 1967),

      (2) all experimental conditions show clear and differential drug effects, as shown, for instance, by the principal component analysis in Figure1I and the cluster analysis in Figure2A,

      (3) the response patterns observed for key metabolites match the anticipated drug effects (e.g. decreased glucose consumption with 1,9-dideoxyforskolin; decreased lactate levels with Shikonin; lactate accumulation with FCCP).

      One can therefore be reasonably certain that these drugs did penetrate the explanted retina and that their respective drug targets were hit. Assessing dose-responses would certainly be interesting, however, the aim of this initial study was not pharmacodynamics but a general manipulation of energy metabolism. Moreover, given the extensive validation of these drugs, off-target effects seem not very likely at the concentrations used.

      We agree with the reviewer that using a longitudinal, time-series type of analysis could give additional insights. We note that each additional time-point will require retinae from 25 animals and a very resource-intensive and time-consuming metabolomic analysis, together with a significantly more complex multivariate analysis (metabolite, experimental condition, time). This is a completely new undertaking that is simply not feasible as an extension of the present study.

      To look at pathway activity in more direct ways is very good idea, to this end we aim to implement in the future an idea put forward by the reviewers, namely 13C-labeling and additionally 15N-labeling and tracing for specific metabolic fuels (e.g. glucose, lactate and anaplerotic amino acids such as glutamate and branched chain amino acids).

      The reviewer is of course correct to say that the culture condition is somewhat artificial and that this may have introduced changes in the metabolism. However, as noted above in the first response to reviewer #1, the organotypic retinal culture system, using a defined medium, free of serum and antibiotics, has been extremely well studied and validated for decades (cf. Caffé et al., Curr Eye Res. 8:1083-92, 1989). Importantly, this system allows to maintain retinal viability, histotypic organization, and function over many weeks in culture. Moreover, most previous studies on retinal metabolism have also used explanted retina – acute or cultured – i.e. experimental approaches that are similar to what we have used and that may be liable to their own artefactual changes in metabolism. This includes the seminal, 1920s studies by Otto Warburg, or the 1980s studies by Barry Winkler, the results of which the reviewers do not seem to doubt.

      We further agree that studying retinal metabolism in a situation closer to in vivo conditions would be thrilling, however to our knowledge to date there is no retina model that fully mimics the complex interplay of the blood metabolome with metabolic tissue activity. This likely means that for each metabolic condition to study (e.g. hyperglycemia, cachexia, etc.), a fairly large number of animals will need to be sacrificed for the molecular investigation of ex vivo retinal biopsies, which would mean a tremendous animal burden.

      We hope the reviewer will appreciate that the revised manuscript now includes numerous improvements, along with new, additional datasets and figures, references to further relevant literature, and – as mentioned above – a more cautious phrasing of our interpretations and conclusions, including a more careful wording for the manuscript title.

      Reviewer #3 (Public Review):

      Summary:

      The neural retina is one of the most energetically active tissues in the body and research into retinal metabolism has a rich history. Prevailing dogma in the field is that the photoreceptors of the neural retina (rods and cones) are heavily reliant on glycolysis, and as oxygen tension at the level of photoreceptors is very low, these specialized sensory neurons carry out aerobic glycolysis, akin to the Warburg effect in cancer cells. It has been found that this unique metabolism changes in many retinal diseases, and targeting retinal metabolism may be a viable treatment strategy. The neural retina is composed of 11 different cell types, and many research groups over the past century have contributed to our current understanding of cell-specific metabolism of retinal cells. More recently, it has been shown in mouse models and co-culture of the mouse neural retina with human RPE cultures that photoreceptors are reliant on the underlying retinal pigment epithelium for supplying nutrients. Chen and colleagues add to this body of work by studying an ex vivo culture of the developing mouse retina that maintained contact with the retinal pigment epithelium. They exposed such ex vivo cultures to small molecule inhibitors of specific metabolic pathways, performing targeted metabolomics on the tissue and staining the tissue with key metabolic enzymes to lay the groundwork for what metabolic pathways may be active in particular cell types of the retina. The authors conclude that rod and cone photoreceptors are reliant on different metabolic pathways to maintain their cell viability - in particular, that rods rely on oxidative phosphorylation and cones rely on glycolysis. Further, their data support multiple mechanisms whereby glycolysis may occur simultaneously with anapleurosis to provide abundant energy to photoreceptors. The data from metabolomics revealed several novel findings in retinal metabolism, including the use of glutamine to fuel the mini-Krebs cycle, the utilization of the Cahill cycle in photoreceptors, and a taurine/hypotaurine shuttle between the underlying retinal pigment epithelium and photoreceptors to transfer reducing equivalents from the RPE to photoreceptors. In addition, this study provides robust quantitative metabolomics datasets that can be compared across experiments and groups. The use of this platform will allow for rapid testing of novel hypotheses regarding the metabolic ecosystem in the neural retina.

      Strengths:

      The data on differences in the susceptibility of rods and cones to mitochondrial dysfunction versus glycolysis provides novel hypothesis-generating conjectures that can be tested in animal models. The multiple mechanisms that allow anapleurosis and glycolysis to run side-by-side add significant novelty to the field of retinal metabolism, setting the stage for further testing of these hypotheses as well.

      Weaknesses:

      Almost all of the conclusions from the paper are preliminary, based on data showing enzymes necessary for a metabolic process are present and the metabolites for that process are also present. However, to truly prove whether these processes are happening, C13 labeling or knock-out or over-expression experiments are necessary. Further, while there is good data that RPE cultures in vitro strongly recapitulate RPE phenotypes in vivo, ex vivo neural retina cultures undergo rapid death. Thus, conclusions about metabolism from explants should either be well correlated with existing literature or lead to targeted in vivo studies. This paper currently lacks both.

      Response: As mentioned above in the first answers to reviewers #1 and #2, we think of our study as a starting point that may provide novel directions for a whole series of investigations into retinal energy metabolism. Especially the use of novel technologies may in the future allow to decipher the different metabolic phenotypes of the 100+ distinct retinal cell types by in situ spatial metabolomics and lipidomics. Currently, we still have to limit the scope of our studies to only certain aspects of this topic. We thus agree that some of our interpretations need to be formulated more carefully and we have done so in the revised version of our manuscript. We also agree with the reviewer that carbon (13C) labelling and tracing studies will be very informative and will engage in such studies in the future. Besides 13C, we aim to further employ 15N labelled substrates, which is especially suitable to study the destiny of amino acids.

      As far as our organotypic retinal explant system is concerned, it is arguably one of the best validated such systems available (see responses to reviewers #1 and #2). While the reviewer is correct to say that the neuroretina without RPE degenerates relatively quickly in vitro, in our system, with the neuroretina and its native RPE cultured together, we can routinely culture the retina for four weeks or more, without major cell loss (Söderpalm et al., IOVS 35:3910-21, 1994; Belhadj et al., JoVE 165, 2020). Thus, our retinal cultures with RPE do not undergo rapid death. Within the time-frame of the present study (6 days in vitro) culturing-induced cell death is minimal and unlikely to influence our analyses. For further, more detailed answers to the reviewers’ questions please see our detailed point-to-point response below.

      We agree with the reviewer that eventually in vivo studies will be important to confirm our interpretations. As mentioned in our initial response to reviewer #1, such studies will be very challenging and new technologies may need to be developed before in vivo investigations can deliver the answers to the questions at hand (see answer to question Rev#3.17 below), especially if the cross-play between substrate availability from the blood metabolome and the retinal metabolic pathway activity shall be studied.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      Rev#1.1. The animals should be screened for and lack rd8.

      Response: This is a pertinent question from the reviewer. Ever since we first became aware of the presence of rd8 mutations in certain mouse lines from major vendors (e.g. Charles River, Jackson Labs) in around 2010, we have setup regular screening of all our mouse lines for this Crb1 mutation. Accordingly, the mouse lines used in this study were confirmed to be free of the rd8 / Crb1 mutation. A corresponding remark has now been inserted into the SI materials and methods section (Lines 37-38).

      Rev#1.2. GLUT1 looks significantly different from in vivo to in vitro. Recommend co-staining with RHO and cone markers (PNA or CAR) to further delineate where it is being expressed. The in vitro cultures appear to have much shorter outer segments (OS). Considering OS biosynthesis is thought to drive a good deal of metabolic adaptations, how relevant is the in vitro model system to what is truly occurring in vivo?

      Response: The GLUT1 staining shown in Figure 1 displays the in vivo situation. Since may not have been entirely clear from the previous figure legend, we have now labelled this as “in vivo retina” and distinguish it from “in vitro” samples in the legend to Figure 1 (Lines 774-778). As far as the comparison of GLUT1 staining in vivo (Figure 1A3) vs. in vitro (Figure S1C3) is concerned, in both situations a strong RPE labelling is clearly visible, with essentially no GLUT1 label within the neuroretina.

      Nevertheless, to better delineate the expression of GLUT1 in the outer retina, we have now performed an additional co-staining with rhodopsin (RHO) as rod marker and peanut agglutinin (PNA) as cone marker, as suggested by the reviewer (new supplemental Figure S1). In brief, this co-staining confirms the strong expression of GLUT1 in the RPE, while there is essentially no GLUT1 detectable in rod or cone photoreceptors.

      Retinal explants in long-term cultures do indeed have somewhat shorter outer segments compared to same age in vivo counterparts (Caffe et al., Curr Eye Res. 8:1083-1092, 1989). However, in the short-term cultures (6 DIV) and at the age studied here (P15) outer segments have only just started to grow out and are around 10 - 12 µm long, both in vitro and in vivo (cf. LaVail, JCB 58:650-661, 1978). Thus, the metabolism required for outer segment synthesis should be equivalent when in vitro and in vivo situations are compared. For considerations on outer segments in retinal explant cultures see also Rev#3.2 and Rev#3.29.

      Rev#1.3. Also, recent publications have shown that GLUT1 is expressed in the neuroretina including rods, cones, and muller glia. Was GLUT1 not appreciated in these cells in your ex vivo samples and if so, why? Likewise, these same studies previously demonstrated GLUT1 resulted in rod degeneration but not cone. The results presented here differ significantly. Why the difference in results and is it secondary to the in vitro vs. in vivo setting? Furthermore, the authors state that they thought the no RPE situation would be similar to the GLUT1 inhibitor experimental condition but instead, they were vastly different. Is this secondary to the fact that GLUT1 is expressed outside the RPE.

      Response: We are aware that there is a controversy regarding GLUT1 expression in the neuroretina, please see also our response to question Rev#3.1 below. As far as our immunostaining for GLUT1 on in vivo retina is concerned, we find an unambiguous and very marked expression of GLUT1 in RPE cells, at both basal and apical sides. Compared to the RPE, the neuroretina appears devoid of GLUT1 staining. However, at very high gamma values a faint staining in the neuroretina becomes visible, a staining which from its appearance – processes spanning the entire width of the retina – is most compatible with Müller glia cells. Under normal circumstances we would have dismissed such a faint staining as background and false positive. Given the sometimes very contradicting reports in the literature, we cannot fully exclude a weak expression of GLUT1 also in cells other than the RPE, with Müller glial cells perhaps being the most likely candidate. At any rate, GLUT1 expression in the neuroretina can only be much weaker than in the RPE, making its relevance for overall retinal metabolism unclear.

      As far as recent publications studying GLUT1 in the retina are concerned, we know of the study by Daniele et al. (FASEB Journal 36:e22428, 2022), which used a rod-specific, conditional knock-out of GLUT1 and found a relatively slow rod degeneration. We are not aware of a selective GLUT1 knock-out in cones, nor are we aware of conditional GLUT3 knock-outs in the retina. For further discussion of the Daniele et al. study please see Rev#3.13.

      The reviewer is right, initially we were thinking that, since GLUT1 was expressed only (predominantly) in RPE, the metabolic response to GLUT1 inhibition should look similar to the no RPE situation. However, this initial hypothesis did not consider a key fact: The RPE builds the blood retinal barrier and the tight-junction coupled RPE cells are a barrier to any larger molecule, including glucose. Removing the barrier by removing the RPE dramatically increases the availability of glucose to the retina, a phenomenon that is likely exacerbated by the expression of the high affinity/high capacity GLUT3 on photoreceptors (cf. Figure S1A). In other words, when the RPE is removed the outer retina is “flooded” with glucose and we believe that this is probably the main factor that explains why the metabolic response to GLUT1 inhibition (1,9-DDF group) is so different from the no RPE condition.

      We have now included an additional corresponding explanation in the discussion (Lines 422-429). Furthermore, we have added an entire new subchapter to the discussion to debate the expression of glucose transporters in the outer retina (Lines 454-472).

      Rev#1.4. Shikonin's mechanism of action via protein aggregation and lack of specificity for PKM2 vs PKM1 at 4uM is an experimental limitation that needs to be taken into account. All treatments utilized are not cell-specific.

      Response: While the reviewer is correct to say that Shikonin may have multiple cellular targets and a diverse range of possible applications as an anti-inflammatory, antimicrobial, or anticancer agent (cf. Guo et al., Pharmacol. Res. 149:104463, 2019), numerous studies support its specificity for PKM2 over PKM1, at concentrations ranging from 1 – 10 µM (Chen et al., Oncogene 30:4297-306, 2011; Zhao et al., Sci. Rep. 8:14517, 2018; Traxler et al., Cell Metab. 34:1248-1263, 2022). We settled for 4 μM as an intermediate concentration, considering its effectiveness and specificity in previous studies. We have now inserted references detailing the specificity and concentration range of Shikonin into the SI Materials and Methods section (Line 62).

      The concern that “all treatments” are not cell-specific is debatable. Certainly, any given compound may have off-target effects, yet, since the compounds we used in our study have all been studied for decades (see above, initial response to Reviewer #2), their off-target profile is well established and unlikely to play an important role here. Moreover, in our study the cell specificity does not come from the compounds used but from where their targets are expressed. As shown in Figure 1A and in Figure S1C, Shikonin´s target PKM2 is almost exclusively expressed in photoreceptor inner segments. Hence, it seems very reasonable to expect that the vast majority of the metabolomic changes observed by Shikonin treatment are related to photoreceptors. We note that this assertion would still be true even if there was a low-level expression of PKM2 in other retinal cell types and/or if Shikonin had moderate off-target effects on other enzymes since the bulk of the effect on the quantitative metabolomic dataset would still originate from PKM2 inhibition in photoreceptors.

      Rev#1.5. What was the method of cone counting in Figure 1?

      Response: Cones were counted per 100 µm of retinal circumference based on an arrestin-3 staining (cone arrestin, CAR).

      This information is now included in the SI Materials and Methods section under “Microscopy, cell counting, and statistical analysis” (Lines 99-100).

      Rev#1.6. How do you know that FCCP is not altering RPE ox phos, disrupting the outer retinal microenvironment and leading to cell death, and therefore, the effects seen are not photoreceptor-specific but rather downstream from the initial insult in RPE?

      Response: We propose that FCCP will be acting on both photoreceptors and RPE cells (and all other retinal cell types) at essentially the same time, over the experimental time-frame. Thus, OXPHOS should be inhibited in all cells simultaneously. However, FCCP will primarily affect cells that actually use OXPHOS to a large extent, while cells relying on other metabolic pathways (e.g. glycolysis) will hardly be affected.

      We believe the very strong effect of FCCP, seen exclusively in rod photoreceptors, to be a direct drug effect. While we cannot not fully exclude an indirect effect via the RPE – as proposed by the reviewer – we think this to be unlikely because:

      (1) RPE viability was not compromised by FCCP treatment.

      (2) If the reviewer´s hypothesis was correct, then also cone photoreceptors should have been affected (e.g. because now the RPE consumes all glucose, leaving nothing for cones). However, cones were essentially unaffected by the FCCP treatment, making a dependence on RPE OXPHOS unlikely. Especially so, because blocking GLUT1 and glucose import on the RPE with 1,9-DDF had only relatively minor effects on rod photoreceptor viability but strongly affected cones. This indicates that the RPE is mainly shuttling glucose through to photoreceptors, especially to cones, and this function does not seem to be impaired by FCCP treatment.

      (3) We found that enzymes required for Krebs-cycle and OXPHOS activity (i.e. citrate synthase, fumarase, ATP synthase γ) are predominantly expressed in photoreceptors but virtually absent from RPE (Figure 3D, see also answer to following question).

      (4) The density of mitochondria (i.e. the target for FCCP) is far lower in RPE than in photoreceptors, as evidenced also by the COX staining shown in Figure 1A. Hence, photoreceptors are far more likely to be hit by FCCP treatment than RPE cells.

      To accommodate the reviewer´s concern, we have now added a further comment into the discussion (Lines 440-442).

      Rev#1.7. While Figure 3D is interesting, it offers no significant insight into mechanisms as the enzyme levels are not being compared to control nor is mitochondrial fitness in these conditions being assessed, which would provide greater insight than just showing that these enzymes are present in the inner segments, which are known to be rich in mitochondria. Additionally, stating that the low ATP is secondary to decreased Krebs cycle activity and ox phos based on merely ATP levels is not supported by metabolite levels minus citrate nor ox phos enzyme levels or oxygen consumption. Also, citrate is purported to be decreased in the table in Figure 2 in the no RPE condition; however, Supplemental Figure 2 demonstrates this change is not significant then the same data is presented in Supplemental Figure 3 and it is statistically significant again. Why the difference in data and why is the same data being shown multiple times?

      Response: The immunostaining shown in Figure 3D shows the in vivo retina, or in other words the localization of enzymes in the native situation. Since this may not have been obvious in the previous manuscript version, we have added a corresponding comment to the legend of Figure 3 (Line 806). The localization of the Krebs-cycle/OXPHOS enzymes citrate synthase, fumarase, and ATP synthase mainly to photoreceptors, but not (or much less) to RPE, is another piece of evidence supporting the idea that OXPHOS is predominantly performed by photoreceptors (see also answer to previous question Rev#1.6).

      The decreased ATP levels (together with citrate, aspartate, NAA) shown in Figure 3 in the no RPE group, are an indication that photoreceptor Krebs-cycle activity may be decreased but not abolished in the absence of RPE. Importantly, GTP levels are not reduced in the no RPE group (Figure 2). Since large amounts of GTP can only by synthesized by either SUCLG-1 in the Krebs-cycle or by NDK-mediated exchange with ATP, the most plausible interpretation is that Krebs-cycle dependent ATP-synthesis was decreased in the no RPE situation, but that the (mini) Krebs-cycle or Cahill-cycle, notably the step from succinyl-CoA to succinate, was running. Since there is no RPE in this group, this strongly suggests important Krebs-cycle/OXPHOS activity in photoreceptors where the majority of the corresponding enzymes are located (see above).

      We thank the reviewer for pointing out that the information on group comparisons may not have been presented with sufficient clarity. In the figures mentioned by the reviewer the data is shown and compared in different contexts: the table in Figure 2B and the data in Figure S3 (now renumbered to Figure S5) refer to two-way comparisons of treatment condition to control, to elucidate individual treatment effects. Meanwhile Figure S2 (now supplementary Figure S3) refers to a 5-way comparison for a general overview that puts all five groups in context with each other. These differences in comparisons and normalization to the respective common standards entail the use of different statistical tools, resulting in different p-values. The statistical testing approaches and thresholds are now disclosed in the figure legends, and additionally in the SI Materials and Methods section (Lines 145-155).

      Rev#1.8. When were the ex vivo samples taken for metabolomics, and if taken when significant TUNEL staining and cell death have occurred, are the changes in metabolism due to cell death or a true indication of differential metabolism? Furthermore, it is unclear if the metabolomics samples included the RPE or not. Considering these treatments will affect most cells in the retina and the RPE, which is included in the ex vivo samples, it is difficult to ascertain that these changes are secondary to the effects on photoreceptors alone.

      Response: The samples for metabolomics included the RPE (except for the no RPE condition) and were taken at the same time as the tissues for histological preparations and TUNEL assays, i.e. they were all taken at post-natal day 15. This has now been clarified in the SI Materials and Methods section (Lines 108-110).

      We cannot entirely exclude an effect of ongoing cell death caused by the different drug treatments on the retinal metabolome. However, since in the experimental treatments cell death was still comparatively low (even in the FCCP condition, overall cell death was only around 10% of the total retina), and the metabolomic analysis considered the entire tissue, the impact of cell death per se on the total metabolome will be comparatively minor (≤ 10%, i.e. within the typical error margin of the metabolomic analysis).

      As mentioned above, the drug treatments should in principle affect all retinal cells at the same time. However, only cells that express the drug targets (i.e. 1,9-DDF targets GLUT1 in RPE cells, Shikonin targets PKM2 in photoreceptors; cf. Figure 1A) should react to the treatment. Even FCCP, in the paradigm employed, will only affect those cells that rely heavily on OXPHOS. Our data indicates that while this is almost certainly the case for rods; cones, RPE cells, and essentially all of the inner retina, are not affected by FCCP treatment, strongly suggesting that OXPHOS is of minor importance for these cell populations.

      Rev#1.9. Why were the FCCP and no RPE groups compared? If they have similar metabolite patterns as noted in Figure 2, would that suggest that FCCP's greatest effect is on the ox phos of RPE and the metabolite patterns are secondary to alterations in RPE metabolism? Also, the increase in citrate and decrease in NAD may be related to effects on RPE mitochondrial metabolism when comparing these groups, and the disruption of RPE metabolism may then result in PARP staining of photoreceptors.

      Response: The reason for the pair-wise comparison of the no RPE and FCCP groups initially was indeed the similarity in metabolite patterns. This was now rephrased accordingly in the results section “Photoreceptors use the Krebs-cycle to produce GTP” (Lines 218-219). The interpretation that the reviewer proposed here is interesting, but does not conform with the data analysis of this and other group comparisons.

      Instead, the similarity between the metabolic patterns found in the no RPE and FCCP groups further supports the idea that a lack of RPE decreases retinal OXPHOS and increases glycolysis. This interpretation is based on the following observations:

      (1) Mitochondrial density in the RPE is far lower than in photoreceptors (see COX staining in Figure 1A), thus quantitatively the metabolite pattern caused by a disruption of OXPHOS (via FCCP treatment) will be dominated by metabolites generated by photoreceptors. For the same reason the depletion of retinal NAD+, and the concomitant increase in photoreceptor PAR accumulation after FCCP treatment, is unlikely to be due to changes in RPE.

      (2) Similarly, citrate synthase (CS) was found to be almost exclusively expressed in photoreceptor inner segments, with little expression in RPE (Figure 3D). Hence, the quantitative increase of citrate levels after FCCP treatment can only originate in photoreceptors.

      (3) The comparison of the control (with RPE) against the no RPE group suggested an increase in (aerobic) glycolysis in the absence of RPE, evidenced notably by a retinal accumulation of lactate, BCAAs, and glutamate (Figure 3A). The very same metabolite pattern is seen for the FCCP treatment (Figure 1B) indicating a marked upregulation of glycolysis (Figure 6C). The latter observation suggests that photoreceptors, after disruption of OXPHOS switch to an exclusively glycolytic metabolism, which, however, rods cannot sustain (Figure 1C, D).

      (4) Glucose consumption and lactate release is increased in the no RPE group vs. control (new Supplementary Figure 4). A similar increase in glucose consumption and lactate production is seen in the FCCP group suggesting that also the no RPE situation disrupts OXPHOS in photoreceptors.

      Rev#1.10. The conclusions being reached are difficult to interpret secondary to the experimental procedures and the fact that the treatments are not cell-specific and RPE is included with the neuroretina as well. Likewise, stating FCCP is altering the Krebs cycle in the neuroretina is difficult to believe as there are no changes in the Krebs cycle when compared to the control, which also has RPE.

      Response: We agree with the reviewer, that some of the conclusions may have been somewhat speculative. Accordingly, we have toned down our conclusions in several instances in the text, notably in abstract, introduction, and discussion.

      When it comes to Krebs cycle intermediates a key limitation of our study is indeed the lack of carbon-tracing and metabolic flux analysis as noted by the reviewers, a limitation that we now highlight more strongly in the discussion of the revised manuscript (Lines 545-549). While it is highly probable that the flux of Krebs cycle intermediates is altered by FCCP, our steady-state data does not show significant changes in the metabolites citrate, fumarate, and succinate. However, our study does show a highly significant decrease in GTP levels, which as explained above, is a key indicator of Krebs cycle activity/inactivity. Moreover, while GTP levels were reduced also in the no RPE group, GTP was still significantly higher in the no RPE group compared to the FCCP treatment. Our interpretation of this finding is that there is Krebs-cycle/OXPHOS activity in the neuroretina, which is abolished by FCCP.

      Rev#1.11. Supplemental Figure 4C and D states that GAC inhibition affected only photoreceptors, but GAC is expressed throughout the retina and so the inhibition is altering glutamine-glutamate homeostasis throughout the retina. Clearly, based on histology, one can see that the architecture of the retina, especially at the highest dose, is lost likely because all cells are being affected. So it is not photoreceptor-specific and even at low doses one can see that the inner retina is edematous. Moreover, with such a high amount of TUNEL staining in the ONL, are rods more affected than cones?

      Response: In our hands the immunostaining for Glutaminase C (GAC) labelled predominantly cone inner segments, the OPL, and perhaps bipolar cells (Figure S1A). The deleterious effects mentioned by the reviewer are only seen at the highest concentration of the GAC inhibitor compound 968. This concentration (10 µM) is 100-fold higher than the dose that produces a significant loss of cones in the outer retina (0.1 µM). We therefore think that this data points to the extraordinary reliance of cones on glutamine and glutamate. As can be seen from the images (Figure S4C) illustrating the effects of 0.1 and 1 µM Compound 968 treatment, the ONL thickness is not significantly reduced by the GAC inhibitor. This strongly indicates that at these doses the rods are not affected by GAC inhibition.

      Rev#1.12. The no RPE vs 1,9 DDF data may be interpreted as preventing glucose transport in the RPE increases BCAA catabolism by the RPE, which has been shown to utilize BCAA in culture systems. To this end, when the RPE is not present, the BCAA is increased as compared to the control with RPE.

      Response: Our original interpretation of this data was that after GLUT1 inhibition and a correspondingly reduced retinal glucose uptake, the retina switched to an increasing use of anaplerotic substrates, including BCAAs. This is supported by the concomitant upregulation of the Cahill-cycle product alanine and the mini-Krebs-cycle product N-acetylaspartate (NAA). Yet, we agree with the reviewer that BCAAs could also be consumed by the RPE. We have now changed our conclusion at the end of the results chapter “Reduced retinal glucose uptake promotes anaplerotic metabolism“ to also highlight this possibility (Lines 261-262).

      Rev#1.13. It is unclear why so much effort is comparing the no RPE group to the treatment groups and not comparing the control group to the different treatment groups.

      Response: Previous studies – including the seminal studies of Otto Warburg from the early 1920s – had always used retina without RPE. This “no RPE” situation is therefore something of a reference for our entire study, which is why we dedicated more effort to its analysis. We have now inserted a corresponding remark into the manuscript (Lines 182-184).

      Rev#1.14. The conclusions are significantly overstated especially with regards to rods versus cones as these are not cell-specific treatments. For example, the control vs 1,9 DDF vs FCCP clearly shows that there is mitochondrial dysfunction due to decreased NAD, increased AMP/ATP ratio, decreased Asp but increased Gln, and a compensatory increase in lactate production.

      Response: We agree with the reviewer and have tried to phrase our statements in more measured fashion. Notably, we have toned down our statements in the title, abstract, results, discussion, and several of the subchapter headings.

      Rev#1.15. While metabolic conclusions are drawn on serine/lactate ratio, this ratio is driven by the drastic changes in lactate and not so much serine in the treatment conditions as it was rather stable. Likewise, substrates beyond glucose have the potential to fuel the TCA cycle and make GTP via SUCLG1, such as fatty acids, other AAs, etc. Therefore, this ratio may not tell the entire story about anaplerotic metabolism. Furthermore, knowing that RPE utilize BCAAs to fuel their TCA cycle, the no RPE condition may simply have increased BCAAs due to lack of metabolism by the RPE, which drives the GTP/BCAA ratio. To state that the neuroretina was utilizing BCAAs for anaplerosis is not well supported based on the current data. Similarly, what is to say that the GTP/lactate ratio in the no RPE situation is not driven by the fact that the RPE is no longer present to act as acceptor of retinal lactate production or that more glucose is reaching the retina since the RPE is not present to accept and utilize that produced. Glucose uptake was not assessed to further address these issues.

      Response: We agree with reviewer that metabolite ratios may not tell the full story underlying retinal metabolism however based on the robustness of using quantitative and highly reproducible NMR data, they are an important part of the metabolomics toolbox. The reviewer correctly observed that the changes in lactate levels are more dramatic than in serine. Still, also serine was significantly increased in the no RPE, 1,9-DDF, and Shikonin groups. Together with the lactate changes (same or opposite direction) the resulting serine/lactate ratios display marked alterations.

      When it comes to the supply of other potential energy substrates mentioned by the reviewer, i.e. fatty acids or amino acids other than BCAAs, these are only supplied in minimal amounts in the defined, serum free R16 medium (Romijn, Biology of the Cell, 63, 263-268, 1988) and – if used to any important extent – would be rapidly depleted by the retina. Thus, for a culture period of 2 days in vitro between medium changes these energy sources are not available and thus cannot be used by the retina.

      Our conclusion that the retina is using anaplerosis is based not only on the observations made in the no RPE group but also on, for instance, the metabolite ratios seen in the 1,9-DDF treatment group. In this group decreased glycolytic activity may correspond to increased serine synthesis and anaplerosis.

      As far as glucose uptake is concerned, we have analysed the medium samples at P15 (equivalent to the retina tissue collection time point) and now present data that addresses this question more directly via the consumption of glucose from and release of lactate to the culture medium (New Supplementary Figure 4C, D). This new dataset provides another independent observation showing that:

      (1) Glucose consumption/lactate release (i.e. aerobic glycolysis) is high in the no RPE situation but low in the control situation. In other words, retinal aerobic glycolysis is most likely stimulated by the absence of RPE.

      (2) 1,9-DDF treatment decreases glucose consumption/lactate release as would be expected from a GLUT1 blocker. Since ATP and GTP production are high nonetheless, this indicates that other substrates (i.e. anaplerosis) were used for retinal energy production, in agreement with the analysis shown in Figure 6C.

      (3) The FCCP treatment, which disrupts oxidative ATP-production, increases glucose consumption/lactate release in way similar to the no RPE situation. Yet, the no RPE retina can still generate sizeable amounts of GTP but not ATP. Together, this provides further evidence that neuroretinal OXPHOS is decreased in the absence of RPE.

      Rev#1.16. The evidence for the mini-Krebs cycle is intriguing but weak considering it is based on certain enzymes being expressed in the photoreceptors, which had already been shown to be present in other publications, and a single ratio of metabolites that is increased in FCCP. One would expect this ratio to be increased under FCCP regardless. There is no stable isotope tracing with certain fuels to confirm the existence of the mini-Krebs cycle.

      Response: We thank the reviewer for this suggestion. We agree that our evidence for the mini-Krebs-cycle (and the Cahill-cycle) may be to some extent circumstantial and additional technologies would help to obtain further supportive data. Still, here we would like to invite the reviewer to a thought experiment where he/she could try and interpret our data without considering the Cahill- or the mini-Krebs-cycle. At least we ourselves, when we engaged into such thought experiments, were unable to explain the data observed without these alternative energy-producing cycles. Most notably, we were unable to explain the strong accumulation of either alanine or N-acetyl-aspartate (NAA) when only considering glycolysis and (full) Krebs-cycle metabolism. Of course, this may still be considered “weak” evidence, and we expect that future studies including complementary technologies will either confirm or expand our interpretation of the existing data set.

      The suggestion to perform stable isotope-labelled tracing with potential alternative fuels (e.g. glutamate, glutamine, pyruvate, etc.) is very attractive indeed. While such studies are likely to shed further light on the metabolic pathways proposed, this will entail very extensive experimental work, with multiple different conditions and concentrations and variety of analysis methods that is currently not feasible (e.g. a 1.7 mm NMR probe equipped with a 15N channel) as an extension of the present manuscript. Nevertheless, we will certainly consider this approach for future follow-up studies once such techniques are available and will screen for suited collaboration partners. A corresponding comment on such future possibilities has now been inserted into the discussion (Lines 545-549).

      Rev#1.17. The discussion does not mention how this data contradicts a recent in vivo study looking at Glut1 knockout in the retina (Daniele et al. FASEB. 2022) or previous in vivo studies that suggest cones may be less sensitive to changes in glucose levels (Swarup et al. 2019). This is a key oversight.

      Response: We thank the reviewer for pointing this out. We now included these studies in the revised discussion in a new subchapter on the expression of glucose transporters in the outer retina (Lines 454-472). For a critical review of the Daniele et al., 2022 study please also see our more detailed response to question Rev#3.13 below.

      Rev#1.18. GAC is expressed in more than just cones so making cell-specific statements regarding fuel utilization is not well supported.

      Response: Our immunostaining for GAC revealed a strong expression in cone inner segments (Figure S1A3). While this does not exclude (relatively minor) expression in other retinal cell types, cones are likely to be more reliant on GAC activity than other cell types. See also answer above.

      Rev#1.19. Suggesting that rods utilize the mini-Krebs cycle based on AAT2 being seen in the inner segments without at least co-staining for RHO or PNA is weak evidence for such a cycle. AAT looks to be expressed in the inner segments of all photoreceptors.

      Response: We have taken up this suggestion from the reviewer and now provide an additional co-staining for AAT1 and AAT2 with rhodopsin. Note that in response to a pertinent comment from Reviewer #3 we have changed the abbreviation for aspartate aminotransferase from “AAT” to the more commonly used “AST” throughout the manuscript.

      New images showing a co-staining for AST1 and AST2 with rhodopsin now replace the former image set in Figure 7D. In brief, the new images show the expression of both AST1 and AST2 across the retina, with, notably an expression in the inner segments of photoreceptors but not in the outer segments, where rhodopsin is expressed.

      Reviewer #3 (Recommendations For The Authors):

      Rev#3.1. The staining for the glucose transporters GLUT1 and GLUT3 does not reflect what has previously been published by two different groups that were validated by cell-specific knockout mice. As mentioned by the author GLUT1 and GLUT3 have differences in transport kinetics, which would affect their metabolism. Therefore, the lack of GLUT1 in photoreceptors would suggest that photoreceptor metabolism is not faithfully replicated in this system. This difference from the previous literature should be discussed in the discussion.

      Response: As the reviewer pointed out, the expression of GLUT1 in the retina is somewhat controversial, with much older literature showing expression on the RPE, while some more recent studies claim GLUT1 expression in photoreceptors. For a brief discussion of our GLUT1 immunostaining please see also our answer to question Rev#1.3 above.

      Although the retinal expression of GLUT1 was besides the focus of our study, we feel we must address this point in more detail: In the brain the generally accepted setup for GLUT1 and GLUT3 expression is that low-affinity GLUT1 (Km = 6.9 mM) is expressed on glial cells, which contact blood vessels, while high-affinity GLUT3 (Km = 1.8 mM) is expressed on neurons (Burant & Bell, Biochemistry 31:10414-20, 1992; Koepsell, Pflügers Archiv 472, 1299–1343, 2020). This setup matches decreasing glucose concentration with increasing transporter affinity, for an efficient transport of glucose from blood vessels, to glial cells, to neurons. In the retina, the cells that contact the choroidal blood vessels are the tight-junction-coupled RPE cells. As shown by us and many others, RPE cells strongly express GLUT1 (cf. Figure 1A-3.). To warrant an efficient glucose transport from the RPE to photoreceptors, photoreceptors must express a glucose transporter with higher glucose affinity than GLUT1. We show that this is indeed the case with photoreceptors expressing GLUT3 (cf. Supplemental Figure 1-5.). While a part teleological explanation does not per se prove that our data is correct, at least our data is plausible. In contrast, the glucose transporter setup sometimes claimed in the literature is biochemically implausible, i.e. for the flow of metabolites (glucose) to go against a gradient of transporter affinities, and we are not aware of an example of such a setup occurring anywhere in nature.

      However, at this point we cannot exclude low levels of GLUT1 expression on Müller glia cells or even photoreceptors. This expression could, for instance, be relevant in cases where cells were shuttling excess glucose – perhaps produced through gluconeogenesis – onwards to other retinal cells. Still, GLUT1 expression can only be minor when compared to RPE since a major expression would destroy the glucose affinity gradient (see above) required for efficient glucose shuttling into the energy hungry photoreceptors.

      To address this request by the reviewer (and also reviewer #1) we now discuss the question of glucose transporter expression in the outer retina in a new subchapter of the discussion (Lines 454-472).

      Rev#3.2. Photoreceptor metabolism and aerobic glycolysis are tied to photoreceptor function, as demonstrated by Dr. Barry Winkler. The authors should provide data or mention (if previously published) about photoreceptor OS growth and function in this system.

      Response: The studies of Barry Winkler (e.g. Winkler, J Gen Physiol. 77, 667-692, 1981) confirmed the original work of Otto Warburg and expanded on the idea that the neuroretina was using aerobic glycolysis. Importantly, Winkler used a very similar experimental setup as Warburg has used, namely explanted rat retina without RPE. In light of our data where we compare metabolism of mouse retina with and without RPE – where retina cultured without RPE confirms the data of Warburg and Winkler – it appears most likely that the purported aerobic glycolysis occurs mostly in the absence of RPE but only to a lower extent in the native retina.

      Photoreceptor outer segment outgrowth is somewhat slower in the organotypic retinal explant cultures compared to the in vivo situation (cf. Caffe et al., Curr Eye Res. 8:1083-1092, 1989 with LaVail, JCB 58:650-661, 1978; see also answer to reviewer #1). Importantly, organotypic retinal explant cultures and their photoreceptors are fully functional and remain so for extended periods in culture (Haq et al., Bioengineering 10:725, 2023; Tolone et al., IJMS 24:15277, 2023). This information has now been added to the manuscript discussion section, into the new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395).

      Rev#3.3. It is unclear from the description of the experiment in both the results and methods if 1,9DDF, Shikonin, and FCCP were added to both apical and basal media compartments or one or the other and should be specified. The details of what was on the apical compartment would be helpful, as the model is supposed to allow for only nutrients from the basal compartment (as indicated by the authors themselves). Is the apical compartment just exposed to air? How does this affect survival?

      Response: In organotypic retinal explant cultures the RPE rests on the permeable culturing membrane such that the basal side is contact with the membrane and the medium below (far schematic drawing see Figure S1B), while the apical side is covered by a thin film of medium created by the surface tension of water (Caffe et al., Curr Eye Res. 1989; Belhadj et al., JoVE, 2020). This thin liquid film ensures sufficient oxygenation and is an important factor that allows the retinal explant to remain viable for several weeks in culture. If the retinal cultures were submerged by the medium, their viability – especially that of the photoreceptors – would drop dramatically and would typically be below 3-5 days. Therefore, in the retinal organotypic explant cultures used here, the nutrients and the drugs applied do indeed reach the outer retina from the basal side, i.e. similar as they would in vivo.

      To address this question from the reviewer, corresponding clarifications have been inserted into the SI Materials and Methods section (Lines 64-66).

      Rev#3.4. As the metabolomic data obtained was quantitative, several metabolites discussed should be analyzed in terms of ratios, for example, Glutathione and glutathione disulfide should be reported as a ratio. In addition as ATP, ADP, and AMP were measured, they can used to calculate the energy charge of the tissue.

      Response: We thank the reviewer for these suggestions and have created corresponding graphs for GSH / GSSG ratio and energy charge. These new graphs have now been added to the SI datasets, to the new Supplementary Figure 4. To accommodate other requests from the Reviewers, this new Figure also contains additional new datasets on glucose and lactate concentrations (see further comments above and below). Please note that all later SI Figures have been renumbered accordingly.

      In brief, the ratios for GSH/GSSG show no significant changes between control and the different experimental groups. Meanwhile, the adenylate energy charge of the retinal tissues show a significant decrease in the energy charge for the Shikonin group and the FCCP group. Note that in the new Supplementary Figure 4A, the dotted lines indicate the energy charge window typical for most healthy cells (0.7 – 0.95).

      Rev#3.5. I think a missed opportunity when discussing the possible taurine/hypotaurine shuttle would be the impact on the osmosis of the subretinal space as taurine has been hypothesized as a major osmolyte.

      Response: This is another interesting recommendation from the reviewer. To address this point, we have now introduced a corresponding paragraph and references in the discussion of the manuscript (Lines 503-504; 512-514).

      Rev#3.6. In Figure 3, the distribution of these enzymes should also be studied under the no RPE condition as the culture treatment took several days for these metabolic changes to occur.

      Response: The images shown in Figure 3D are from the in vivo retina. Since this may not have been very clear in the previous manuscript version, we have now added a corresponding explanation to the legend of Figure 3. As far as we can tell, the expression and localization of neuroretinal enzymes does not change in cultured retina, during the culture period (compare Figure 1A with Supplementary Figure S1C). However, when it comes to the metabolite taurine its production (localization) changes dramatically in the no RPE situation where taurine is essentially undetectable by immunostaining (not shown but see metabolite data in Figure 2A, Figure 3A).

      Rev#3.7. In Figures 4 and 5, it is unclear why the experimental groups were not compared to the control and requires further explanation. Furthermore, the authors should justify the concentrations of drugs used as the cell death could have risen from toxicity to the drugs and not due to disruption of metabolism.

      Response: The reviewer is right, the rationale for these comparisons may not have been laid out with sufficient clarity. In Figure 4 the no RPE and FCCP groups are compared because both groups showed similar metabolite changes towards the control situation. The no RPE to FCCP comparison thus focussed on the details of the – at first seemingly minor – differences between these two groups. This has now been clarified in the corresponding part of the results (Lines 218-219).

      In Figure 5A, B we compare the no RPE and 1,9-DDF groups with each other, notably because the data obtained seemingly contradicted our initial expectation that these two groups should show similar metabolite patterns. Also here, we have now inserted an additional explanation for this choice of comparisons (Lines 252-253).

      In Figure 5C, D we compare the Shikonin and FCCP groups with each other. The idea behind this comparison was that in the 1st group glycolysis was blocked while in the 2nd group OXPHOS was inhibited, or in other words here were compared what happened when the two opposing ends of energy metabolism were manipulated in opposite directions. This reasoning is now given in the results section (Lines 265-268).

      As far as the choice of drugs and concentrations is concerned, we used only compounds that have been extremely well validated through up to five decades of scientific research (see initial response to Reviewer #2 above). We therefore are confident that at the concentrations employed the results obtained stem from drug effects on metabolism and not from generic, off-target toxicity. Then again, as we show, prolonged (i.e. 4 days) block of energy metabolism pathways does cause cell death.

      Rev#3.8. In line 203, the authors discuss GTP as being primarily a mitochondrial metabolite, however, photoreceptors would require a localized source of GTP synthesis in the outer segments as part of phototransduction, and therefore GTP in photoreceptors cannot be a mitochondrial-specific reaction in photoreceptors. Furthermore, the authors mentioned NDK as being a possible source of GTP, but they do not show NDK localization despite it being reported in the literature to be localized in the OS.

      Response: The question as to the source of GTP in photoreceptor outer segments is indeed highly relevant. For GTP production in mitochondria see the answer to the next question below (Rev#3.9). An early study showed nucleoside-diphosphate kinases (NDK) to be expressed on the rod outer segments of bovine retina (Abdulaev et al., Biochemistry 37:13958-13967, 1998). More recently NDK-A was shown to be strongly expressed in photoreceptor inner segments (Rueda et al., Molecular Vision 22:847-885, 2016). We now refer to both studies in the results section of the manuscript (Line 227-228).

      Rev#3.9. In the "Impact on glycolytic activity, serine synthesis pathway, and anaplerotic metabolism" section, the authors claim in the no RPE group glycolytic activity was higher due to a depressed GTP-to-lactate ratio. However, this reviewer is under the impression that GTP production in photoreceptors is not mitochondrial specific, so this ratio doesn't make sense (I could be mistaken, however). A better ratio would have been pyruvate/lactate or glucose/lactate when discussing increased glucose consumption.

      Response: We appreciate the reviewers’ comment, yet we do indeed believe we can show that GTP-production in our experimental context is mainly mitochondrial. As explained in the manuscript results section (“Photoreceptors use the Krebs-cycle to produce GTP”), there are essentially only two possibilities for a photoreceptor to produce sizeable amounts of GTP. In the mitochondria via SUCLG1 – i.e. an enzyme highly expressed in photoreceptor inner segments (Figure 5D) – and the cytoplasm via NDK from excess ATP. The claim about the depressed GTP-to-lactate ratio in the no RPE situation takes this into account. Importantly, since in the no RPE situation ATP-levels are significantly lower than GTP, here GTP can only be produced via SUCLG1 and OXPHOS. Moreover, this contrasts with the FCCP group where mitochondrial OXPHOS is disrupted and both ATP and GTP are depleted.

      As far as ratios with pyruvate and glucose are concerned, we agree that these could potentially be very interesting to analyse. Unfortunately, in our retinal tissue 1H-NMR spectroscopy- based metabolomics analysis the levels of both pyruvate and glucose were below the detection limits which likely reflects their rapid metabolic turnover (cf. table S1). While this might be attributable to the marked consumption of these metabolites within the tissue, it does not allow for us to calculate the suggested ratios to lactate. Then again, in the supernatant medium which was collected at the same time point as the retina tissue, we can readily detect glucose and lactate levels, for this data please see the new Supplementary Figure 4.

      Rev#3.10. Aspartate aminotransferase should be abbreviated as AST, as it is more commonly noted.

      Response: In response to this comment from the reviewer, we have changed the abbreviation for aspartate aminotransferase from AAT to AST throughout the manuscript.

      Rev#3.11. In the discussion the assumptions of the ex vivo culture systems should be clearly stated. One that was not mentioned, but affects the implications of the data, is that the retinas used in this study are from the developing mouse eye. Another important assumption that was made in this paper was that the changes in retinal metabolism were due to photoreceptors even though the whole neural retina was included.

      Response: The reviewer is correct; we have added these two points to the discussion section of the manuscript. Notably, we now included a new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395) to present different in vitro and in vivo test systems.

      Rev#3.12. Starting at line 347: As the authors know, the RPE has been shown to be highly reliant on mitochondrial function, and disruption of RPE mitochondrial metabolism leads to photoreceptor degeneration (numerous papers have shown this). Furthermore, the lower levels of lactate detected in their explants when RPE was present suggests that lactate is actively transported out of the neural retina by the RPE.

      Response: The reviewer is right about lactate being exported from the retina to the blood stream in vivo, or, in our in vitro study, to the culture medium. In the new dataset showing glucose and lactate concentrations in the culture medium (new Supplementary Figure 4C, D), we show that without RPE (no RPE group) and the retina releases more significantly lactate into the medium than control retina with RPE. At the same time the no RPE retina consumes more glucose than control retina.

      Rev#3.13. Line 360: Again, in mouse photoreceptors (by bulk RNAseq and scRNAseq), there is no GLUT3 expression (encoded by slc2a3). It was also recently shown by Dr. Nancy Philp's lab that rod photoreceptors express GLUT1, encoded by slc2a1 (PMCID: PMC9438481). The differences reported in this study and previous studies should be discussed.

      Response: Although this comment may not make us very popular, we are somewhat sceptical of RNAseq data (especially single cell RNAseq) since the underlying methodology – at the current level of technological development – is notoriously unreliable when it comes to the assessment of low abundance transcripts and suffers from apoor batch reproducibility, compared to NMR based metabolomics. Due to methodological constraints RNAseq have a propensity to display erroneously high or low expression. Moreover, and perhaps even more important, dissociated cells in scRNAseq studies undergo rapid gene expression changes that can significantly falsify the image obtained (Rajala et al., PNAS Nexus 2:1-12, 2023). Finally, it cannot be emphasized enough that mRNA expression profiles DO NOT equate protein expression and there are numerous examples for divergent expression profiles when mRNA and protein is compared.

      The Daniele et al. study (FASEB Journal 36:e22428, 2022; PMCID: PMC9438481) used in situ hybridization to study the mRNA expression of GLUT1 (slc2a1) and GLUT3 (slc2a3). In line with our comment just above, the Daniele et al. study may provide for an example of divergence between mRNA and protein expression, since it seemingly showed only minor expression of GLUT1/slc2a1 in the RPE, i.e. precisely in the one cell type that is well-known for its very strong GLUT1 protein expression.

      Furthermore, Daniele et al. used a conditional GLUT1 knock-out in photoreceptors induced by repeated Tamoxifen injections. The photoreceptor GLUT1 knock-out led to a relatively mild phenotype with only about 45% of the outer nuclear layer lost over a 4-months time-course. This is in stark contrast with the FCCP or the 1,9-DDF treatment, which would ablate nearly all rod photoreceptors in under one or two weeks, respectively.

      As a side note, Tamoxifen is an oestrogen receptor antagonist (with partial agonistic behaviour) with a long history of causing retinal and photoreceptor damage. Notably, oestrogen receptor signalling is important for maintaining photoreceptor viability (Nixon & Simpkins, IOVS 53:4739-47, 2012; Xiong et al., Neuroscience 452:280-294, 2021). Therefore, the relatively minor effects of the conditional GLUT1 KO in photoreceptors found in Daniele et al. may have been confounded by direct tamoxifen photoreceptor toxicity. On a wider level, this possible confounding factor related to the use of Tamoxifen points to general problems associated with certain forms of genetic manipulations.

      We now mention the controversy around the expression of glucose transporters in the retina, including the Daniele et al. study in a new subchapter of the discussion on "Expression of glucose transporters in the outer retina” (Lines 454-472).

      Rev#3.14. Lines 370-372: FCCP caused a strong cell death phenotype in rods, however under stress rods upregulate the secretion of RdCVF, which leads to cone photoreceptor survival by the upregulation of aerobic glycolysis in cones. The data should be re-interpreted in the context of this previous literature.

      Response: We thank the reviewer for this comment; however, we could not find a reference that would state that “…under stress rods upregulate the secretion of RdCVF”. What we did find was a reference stating that similar factors such as thioredoxins (TRX80) are secreted from blood monocytes under stress (Sahaf & Rosén, Antioxid Redox Signal 2:717-26, 2000). However, we consider these cells to be too dissimilar to rod photoreceptors to warrant a corresponding comment. Moreover, the research group who discovered RdCVF originally showed that rod-secreted RdCVF cannot prevent cone degeneration if the corresponding Nxnl1 gene is knocked-out in cones, arguing for a cell-autonomous mechanism of RdCVF -dependent cone protection (Mei et al., Antioxid Redox Signal. 24:909-23, 2016).

      Since it is very possible that we may have missed the correct reference(s), we would welcome further guidance by the reviewer.

      Rev#3.15. Line 374: 1,9-DDF caused a 90% loss of cones, however, previous studies by Dr. Nancy Philp have shown glucose deprivation in the outer retina affects primarily rod photoreceptors. The differences should be discussed.

      Response: We thank the reviewer for directing us to these studies. As mentioned above (Rev#3.13.) the Daniele et al. 2022 study yielded only relatively mild effects for a rod-specific conditional GLUT1 KO on photoreceptor viability. Similarly, in an earlier study (Swarup et al., Am J Physiol Cell Physiol. 316: C121–C133, 2019) the Philp group found that also a GLUT1 KO in the RPE caused only a minor phenotype in the photoreceptor layer. We would argue that if glucose, and by extension aerobic glycolysis, were indeed of major importance for (rod) photoreceptor survival, the degenerative effect of these genetic GLUT1 ablations should have been devastating and should have destroyed most of the outer retina in a matter of days. The fact that this was not seen in both studies is another piece of independent evidence that rod photoreceptors do not rely to any major extent on glycolytic metabolism.

      The two studies from the Philp lab (Swarup et al., 2019; Daniele et al., 2022) are now cited in the discussion (Lines 417-419 and 458-460).

      Rev#3.16. Line 375: Yes Dr. Claudio Punzo and Dr. Leveillard Thierry along with other groups have shown glycolysis is required to maintain cone survival when under stress, however, the authors should emphasize that it is under stress that this is observed.

      Response: In response to this comment we have now specifically extended our corresponding remark in the discussion of the manuscript (Lines 446-447).

      Rev#3.17. The section "Cone photoreceptors use the Cahill-cycle". The presence of ALT in photoreceptors was surprising and suggests alternatives to the Cori reaction. However, previous measurements of glucose and lactate from localized in vivo cannulation of animal eyes suggest the majority of glucose taken up by the retina is released back to the blood as lactate. Again, this section should discuss this idea in terms of the previous literature.

      Response: Here, we believe the reviewer is referring to studies performed in the late 1990s where, in anaesthetized cats, the lactate concentration in blood samples obtained from choroidal vein cannulation was compared against that in blood samples obtained from femoral arteries (Wang et al., IOVS 38:48-55, 1997). We note that a more relevant in vivo measurement of retinal glucose consumption and lactate production would likely require the simultaneous cannulation of the central retinal artery (CRA) and the central retinal vein (CRV). This would need to be combined with repeated (online) blood sampling, drug applications, and subsequent metabolomic analysis. We are not aware of any in vivo studies where such procedures have been successfully performed and further miniaturization and increased sensitivity of metabolomic analytic equipment will likely be required before such an undertaking may become feasible. Even so, such studies may not be feasible in small rodents (mice, rats) and may instead require larger animal species (e.g. dog, monkey) to overcome limitations in eye and blood sample size.

      We have now extended the discussion of our manuscript with a new subchapter on “The retina as an experimental system for studies into neuronal energy metabolism”. Within this new subchapter we now present two different in vivo experimental approaches that addressed retinal energy metabolism (Lines 376-384). Moreover, we now present new data on retinal lactate release to the culture medium, showing, for instance, a strong increase in lactate release in the no RPE condition compared to control (new Supplementary Figure 4).

      Rev#3.18. Lines 431-433: The study cited suggested that the mitochondrial AST was detected in other cells, in agreement with the data shown. However, the authors' statements in this section are misleading as they do not take into consideration the contribution of AST from other cell types.

      Response: The reviewer is right, we found both AST1 and AST2 to be expressed not only in photoreceptor inner segments but also in the inner retina, especially in the inner plexiform layer (new Figure 6D). Since this might indicate mini-Krebs-cycle activity also in retinal synapses, we have added a corresponding comment to the discussion (Lines 540-543).

      Grammatical and wording fixes:

      Rev#3.19. Line 98 - "the recycling of the photopigment, retinal."

      Response: We have inserted a comma after “photopigment”.

      Rev#3.20. Results section and Figure 1 start without providing context for the model system where staining is being done.

      Response: We have added this information to the beginning of the results section (Lines 105-106).

      Rev#3.21. Supplementary Figure 2 is not mentioned in the main text - there is no context for this figure.

      Response: Supplementary Figure 2 was originally referenced in the legend to Figure 2. We now mention supplementary figure 2 (now renumbered to supplementary figure S3) also in the main text, in the results section under “Experimental retinal interventions produce characteristic metabolomic patterns” (Line 148).

      Rev#3.22. Volcano plot in Supplementary Figures 3, 5, 6, 7, and 8 don't indicate what Log2(FC) is in reference to.

      Response: The log2 fold change (FC) is calculated as follows: log2 (fold change) = log2 (mean metabolite concentration in condition A) - log2 (mean metabolite concentration in condition B) where condition A and condition B are two different experimental groups being compared. This is now explained in the SI Materials and Methods (Lines 145-147) and indicated in abbreviated form in the figure legends. Please note that supplemental figures have now been renumbered due to the insertion of an additional, new Figure.

      Rev#3.23. Line 331 - –a“d allowed to analyze the..." ”s incorrect phrasing.

      Response: This phrasing was changed.

      Rev#3.24. Line 343 "c“cled" ”

      Response: This phrasing was changed.

      Rev#3.25. Line 446 is misworded.

      Response: This phrasing was changed.

      Technical questions:

      Rev#3.26. At what point after explant was the IHC done in Supplemental Figure 1? If early, but experiments are done later, there's’a chance things are more disorganized at the end of the experiment.

      Response: Staining and metabolomics analysis were both done at the end of each experiment, at the same time, at P15. This is now mentioned in the SI materials and methods section (Lines 67, 108-110).

      Rev#3.27. FCCP affects plasma membrane permeability, which is particularly critical in neurons that undergo repolarization and depolarization - –ow do we know FCCP on cell death via metabolism? See: https://www.sciencedirect.com/science/article/pii/S2212877813001233

      Response: The reviewer is correct, a significant permeabilization of cell membranes in general would likely cause extensive neuronal cell death, unrelated to a disruption of OXPHOS. However, the FCCP concentration used here (5 µM) is at the lower end of what was used in the mentioned Kenwood et al. study (Mol Metab. 3:114-123, 2014) and the effect on cell membrane permeability in tissue culture is likely to be rather small, as opposed to what was seen by Kenwood et al. in cultures of individual cells. This view is supported by the fact that in our FCCP treatments, we did not observe any significant increases of cell death in any retinal cell type (including RPE) other than in rod photoreceptors. Together with the fact that only photoreceptors strongly express Krebs-cycle/OXPHOS related enzymes, this strongly suggests that the FCCP effects seen by us were due to disruption of OXPHOS.

      Rev#3.28. Numerous metabolite comparisons are being made throughout the manuscript – what type of multiple hypothesis testing corrections are utilized? Only certain figures mention multiple hypothesis testing (e.g. Figure 6).

      Response: In general, in this manuscript we used two different statistical methods: 1) For two-group comparisons, we used an unpaired, two-tailed t-test, which reports a p-value with 95% confidence interval without additional multiple hypothesis testing (e.g. in Figure 2, Suppl. Figures 4, 6, 7, 8). 2) For multiple group comparisons we used a one-way ANOVA analysis with Tukey’s multiple comparisons post-hoc test (except suppl. Figure 9 where Fisher´s LSD post-hoc test was used). The information on which statistical test was used for what dataset is now given in the figure legends and in the SI Material and Methods section.

      Rev#3.29. For Figure 3, how do we know that the removal of RPE is causing the metabolite changes due to RPE-PR coupling? How do you rule out the fact that it isn’t just: I – a thicker physical barrier between media and the neural retina that is causing the changes, or II – removal of RPE from PR causes OS shearing and a stress response that alters metabolism?

      Response: We believe these concerns can be ruled out: The RPE cells are linked by tight junctions and are not “just a thicker barrier” but a barrier that is almost impermeable for most metabolites unless they are carried by specific transporters. Outer segment shearing via RPE removal would indeed be a concern if we had used adult retina. However, we explanted that retina at P9 when it does not possess any sizeable outer segments yet. As a matter of fact, photoreceptors grow out outer segments only after P9.

      Rev#3.30. While 1,9-dideoxyforskolin blocks GLUT1, it is known to have other effects, including on potassium channels. How do we know the effects of 1,9-dideoxyforskolin are specific to GLUT1? Utilizing a GLUT1 KO and showing no additional effects when adding 1,9-dideoxyforskolin would be helpful as a control.

      Response: This is a good suggestion from the reviewer. We note that this is technically not easy to achieve as it would require an RPE-specific knock-out that should be inducible at a given experimental time-point, in a quantitative manner. The study by Swarup et al. (see above Rev#3.13.) used an RPE specific knock-out that was, however, not inducible. Moreover, if the corresponding inducible knock-out animals could be generated, then the stochastic nature of the inducing treatment would probably affect only a limited number of cells within a given cell population. In our experimental context, a less than quantitative knock-out would significantly complicate interpretation of results, even to the point that no additional insight might be gained.

      Rev#3.31. The analysis in Figure 6, even with attempts to control drug treatments, is highly speculative. One really needs animals with predominately cones vs. predominately rods to do this analysis (e.g. with NRL mice).

      Response: The reviewer is right, the analysis shown in Figure 6 was an explorative approach to try and deduce features of rod and cone metabolism. This is now mentioned in the results section (Lines 282-284). Since the experiments were not initially intended to address such questions, by necessity the interpretations remain speculative. The comparison of mouse mutants in which there are either no cones (e.g. cpfl1 mouse) or no rods (e.g. NRL knock-out mouse) may allow to disentangle the metabolic contributions of rods and cones. We appreciate the suggestion from the reviewer and have now inserted a relating suggestion for future studies into the discussion section (Lines 450-452).

      Rev#3.32. Overall, much of the paper suggests intriguing pathways, but without C13 tracing or relevant genetic knock-outs, the pathways would have to be speculative rather than definitive.

      Response: We agree with the reviewer that further research, including 13C and 15N-tracing studies, will be necessary to evaluate which pathway(s) are used by what retinal cell type under what condition. Still, the high robustness and quantitative nature of the NMR metabolomics data allows us to draw pathway conclusions based on metabolites that are unique to specific pathways/cell types or using ratios. We now relate to the advantages of such carbon-tracing studies in the discussion of the manuscript (Lines 545-549).

      Stylistic suggestions:

      Rev#3.33. This is a very dense paper to read. It would be helpful for each figure to have a summary diagram of the relevant metabolite changes and how they fit together. Further, for those not metabolism-inclined, defining the mini-Kreb’s, Cahill, and Cori cycles and their brief implications at some point early in the manuscript would be helpful.

      Response: We have been thinking a lot about how we could add in the suggested summary diagrams into each figure. Unfortunately, whatever idea we contemplated would have significantly increased the complexity of the figures, while the actual benefit in terms of improved understandability was unclear.

      However, we did include the suggestion from the reviewer to present the terms Cori, Cahill-, and mini-Krebs-cycle already in the introduction and we hope that this has improved the understandability of the manuscript overall (Lines 79-92).

      Rev#3.34. More discussion about the step-by-step ways that the mini-Kreb’s reaction “uncouples” glycolysis from the Kreb’s cycle would be helpful. What do you mean by “uncouple” in this context?

      Response: We thank the reviewer for this suggestion. Uncoupling in this context means that glycolysis and Krebs cycle are not metabolically coupled to each other via pyruvate. Instead both pathways can run independently from each other and in parallel, as long as the Krebs-cycle uses glutamate, BCAAs or other amino acids as fuels. We now also address this point already in the introduction of the manuscript (Lines 87-90).

      Conceptual questions:

      Rev#3.35. As the proposal that PR undergo heavy amounts of OXPHOS is controversial, it would be helpful for the authors to review the literature on lactate production by the retina and what studies have shown previously about retina use of lactate, specifically lactate making its way into TCA cycle intermediates, suggesting OXPHOS, in PRs.

      Response: In response to this question we have added several new references to the introduction and discussion of the manuscript. The question of lactate production (aerobic glycolysis) vs. the use of OXPHOS is now discussed in Lines 77-81, Lines 367-384.

      Rev#3.36. Why would cones die more in the no RPE condition? The authors suggest this has something to do with GLUT1 expression on RPE and the transport of glucose to cones. Even if we accept that cones are highly glycolytic, loss of RPE should expose the neural retina to even more glucose in your experimental set-up.

      Response: This is a very interesting question from the reviewer. Indeed, loss of the RPE and blood-retinal barrier function should increase photoreceptor access to glucose, even more so if they are expressing high affinity GLUT3. In the discussion (Lines 420-424), we speculate that this may trigger the Crabtree effect, shutting down OXPHOS and causing the cells to exclusively rely on glycolysis. This, however, will likely not yield sufficient ATP to maintain their viability, so that they “starve” to death even in the presence of ample glucose. Since cones require at least twice as much ATP as rods, they may be more sensitive to a Crabtree-dependent shut-down of OXPHOS. However, if this speculation was correct then the question remains why the FCCP treatment, which abolishes OXPHOS more directly, does not cause cone death. Here, we again can only speculate that high glucose may have additional toxic effects on cones that are independent of OXPHOS. We now try to present this reasoning in the discussion (Lines 426-429).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their comments, as well as for the time dedicated to make useful suggestions that have contributed to improve the manuscript. We have responded to the concerns raised by the reviewers, and after that, we have also responded to the different points highlighted in the Recommendations for the authors:

      Reviewer #1

      While in vivo injury was used to assess regeneration from subsets of PNS neurons, different in vitro neurite growth or explant assays were used for further assessments. However, the authors did not assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro. Such results will be important in interpreting the results.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Intriguingly, even in individual groups of PNS neurons, not all neurons regenerate to the same extent. It is known that the distance between the cell body and the lesion site affects neuronal injury responses. It would be interesting to test this in the observed regeneration.

      Although it is true that the distance can affect the outcome, here we used a physiological model where all neurons are lesioned at the same point in the nerve. Not only distance is different for motoneurons, but also the microenvironment surrounding their somas and therefore the direct comparison of these neurons with sensory neurons is limited. We extended the discussion on this matter in the new manuscript.

      Fig 1: The authors quantified the number of regenerating axons at two different time points. However, the total numbers of neurons/axons in each subset are different. The authors should use these numbers to normalize their regenerative axons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. It would be informative for the authors to analyze/discuss this further.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      Fig 6: Is it possible to assess the regenerative effects of knockdown Med12 after in vivo injury?

      It is possible, but it is out of the scope of this work. Here, we aimed to describe the regenerative response and validate our data by testing a potential target for specific regeneration. Future studies will focus on the modulation of this specific regeneration both in vitro and in vivo.

      Reviewer #2

      It seems that the most intriguing outcome of this paper revolves around the role of Med12 in nerve regeneration. The authors should prioritize this finding. Drawing a conclusion regarding Med12's role in proprioceptor regeneration based solely on this in vitro model may be insufficient. This noteworthy result requires further investigation using more animal models of nerve regeneration.

      The main goal of this work was to compare the regenerative responses of different neuron subpopulations. We modulated Med12 to validate our data and the potential of our findings. Unfortunately, investigating in depth the role of Med12 in regeneration is out of the scope of this paper. For this reason, we did not prioritise this finding here. As this finding was striking, we strongly agree that the next step should be studying how it modulates regeneration.

      One critique revolves around the authors' examination of only a single time point within the dynamic and continuously evolving process of regeneration/reinnervation. Given that this process is characterized by dynamic changes, some of which may not be directly associated with active axon growth during regeneration, and encompasses a wide range of molecular alterations throughout reinnervation, concentrating solely on a single time point could result in the omission of critical molecular events.

      We agree that this is probably the main limitation of this study, as we discussed in the text. We chose 7 days postinjury as a standard time point widely described in literature and to have a correlate with our histological data. Although the main aim was to compare populations, analyzing an additional time point after injury could add valuable information.

      Reviewer #3

      No concerns were expressed by that reviewer.

      Recommendations for the authors:

      The authors should assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Optional:

      It will be interesting to test if the distances between the cell body and the lesion site contribute to the observed differences in individual subsets of PNS neurons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. At least the authors should discuss these.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      While the paper is technically well-executed, the conclusions and some of the findings appear to be incomplete and challenging to draw meaningful conclusions from. This manuscript presents some interesting findings, but the title is quite broad and may suggest that the authors have unveiled fundamental mechanisms explaining the varying regenerative abilities of peripheral axons. However, the results do not substantiate such a conclusion. Further comments and suggestions follow.

      We eliminated the word “regenerative (response)” from the title, as it could lead to think that all changes seen in these neurons are related only to regeneration. We think that “Neuron-specific RNA-sequencing reveals different responses in peripheral neurons after nerve injury” highlights the differences between neurons that we found without misleading towards thinking that we described regenerative mechanisms in all neurons.

      What's notably absent here is the validation of certain genes found with the ribosomes, especially those highlighted in the subsequent figures. The question arises as to whether the changes depicted in the figures align with changes in the DRGs in vivo. Is there concordance between the presence of these genes and their transcriptional changes? It would greatly enhance the study's value if the authors could show evidence of upregulation or downregulation of certain genes over time in tissue sections, utilizing techniques such as in situ hybridization or immunocytochemistry.

      We selected some factors that were specifically upregulated in subsets of neurons to corroborated by immunohistochemistry these findings. Changes in the immunofluorescence of P75 in motoneurons and ATF2 in cutaneous mechanoreceptors, were evaluated in controls and animals that received a nerve crush one week before. Supplementary figures with the images have been added.

      The authors discovered intriguing distinctions, such as the presence of specific signaling pathways unique to neurons projecting to muscle as opposed to those projecting to the skin. Among these pathways were those associated with receptor tyrosine kinases like VEGF, erbB, and neurotrophin signaling among others. The question now arises: do these pathways play a role in natural peripheral regeneration processes? To answer this, it is imperative to conduct in vivo studies. However, the authors employed an in vitro DRG neurite outgrowth assay to demonstrate that various types of neurons exhibit different responses to the presence of different neurotrophins. This does not reflect what actually happens in vivo. While neurotrophins indeed play a role in neuron survival and axon extension during development, their role in postnatal periods changes over time, and it remains unclear whether they play any role in the natural regenerative processes of the peripheral nerve. Therefore, this experiment may not be directly relevant in this case, especially during the early axon extension period of the regenerating axons. if the authors aim to establish a causal link with neurotrophin signaling, it becomes crucial to conduct in vivo experiments by manipulating the expression of key molecules like the receptors.

      It has been widely described that different types of peripheral neurons have a differential expression of Trk receptors, even in the adult, and that these respond differentially to neurotrophins. In our study, we do not stablish a causal relationship between the expression of Trk and neurite extension, but instead we show (as many others) that distinct neurons respond differentially to these neurotrophins. The fact that in vivo studies fail to show a clear effect does not necessarily mean that neurotrophins are not specific. It might mean that their effect is not strong enough to be a useful guide in the complex microenvironment found after an injury. For instance, NGF acts on TrkA (present in some neurons), but in vivo it has been shown to accelerate the clearance of myelin debris in Schwann cells (Li et al., 2020), which could facilitate regeneration of all type of axons, masking any potential specific effect on the subtypes of neurons expressing TrkA. In contrast, in an in vitro setting on neuronal cultures, the specific neuronal effect can be more evident.

      Additionally, it's worth noting that another paper utilizing the same methodology and experimental setup (PMID: 29756027, "Translatome Regulation in Neuronal Injury and Axon Regrowth" by Rozenbaum et al.) exists. Are there any significant differences or shared findings with that study?

      This study shows the transcriptomic response after an injury 4, 12 and 24 hours after an injury in a very similar experimental setup. They focus on comparing the neuronal vs the glial response to the injury, using a Ribotag line that tags ribosomes from all neurons in the DRG rather than specific neuron subtypes. As the time postinjury (24h vs 7 days) and the cell types studied are different, we could not directly compare our results. We did see an upregulation in both datasets of previously described growth-associated genes (Jun, Atf3, Sox11, Sprr1a, Gal…). We included the article in the references for its relevance in the topic.

      It would be helpful for readers to illustrate the finding of the fastest axon regeneration of nociceptors by showing fluorescence micrographs of the nerve samples in addition to the graphs shown in Fig. 1 C/D.

      In figure 1B, we show fluorescence micrographs of the nerves 7 days postinjury. As explained in the results, we counted the number of axons at 2 distances from the injury, we did not analyse the fastest axon. This is due to technical reasons: 7 days after the injury the fastest axon has surpassed our evaluation point, which was the further distance that we could assess in our experimental setting in a consistent manner. If the reviewer thinks that we need to include more images from our evaluations (from 9 dpi for example), we could prepare a new figure.

      The labeling in Fig. 2B is confusing. Is the CHAT immunoreactivity shown in the last panel illustrated by green or red signals? Is the red signal counterstaining with beta-tubulin?

      The labelling was changed in the figure to increase clarity.

      The references to the supplementary data throughout the manuscript are confusing. For example, where can the "Supp data 2" be found? (mention on p. 14 in the merged pdf file). Are they referring to the Excel spreadsheets?

      We divided the supplementary material in supplementary figures/table (found in the pdf) and supplementary data. Supplementary data refers to excel spreadsheets found outside the pdf file. We hope this will be clearer after the final formatting of the article.

      What does the following statement on p. 14 mean?: "The caveat in these analyses was that molecular classification by these approaches may be arbitrary, and not reflective of protein repurposing." This reviewer notes that these databases consider the fact that components participate in different pathways.

      Indeed, we aimed to explain that many proteins participate in different pathways, and this is a limitation of the enrichment analysis. We modified the sentence in the text.

      First paragraph on p. 15: The PPAR and AMPK pathways have much broader roles, and are not only "related to fatty acid metabolism". This factual inaccuracy should be corrected in the manuscript.

      The sentence has been corrected.

      The authors should consider showing increased TGF-beta signaling in their neurons after downregulation of Med12 given the previous implication of TGF-beta signaling in axon regeneration.

      We tried to demonstrate the effect of our knockdown in TGF-beta pathway by analyzing the expression of typical targets from this pathway by qPCR in our cultures. However, we could not detect any difference. We think that this can have two explanations: (1) as only a few cells upregulate Med12 whereas many cells downregulate it, the effect is masked (presumably only proprioceptors will have a significant difference in this pathway and, thus, it would be very difficult to see the effect), or (2) Med12 is not exerting its effect through this pathway. We added a supplementary figure with these data and discussed it in the manuscript.

      It would be helpful to eliminate typos and improve syntax/grammar/style.

      We revised the text to improve style.

    1. Author Response

      Public Reviews:

      Reviewer #1

      Strengths:

      Overall, the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we will revise the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We will ensure to include the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      3xTg mice show early Aß accumulation in VIP-positive interneurons.

      3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      3xTg mice show increased O/A interneuron activity during specific behavioral conditions.

      3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      RE: We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their constructive comments. The following is our point-to-point responses.

      Reviewer #1 (Recommendations For The Authors):

      Point 1- Abstract: advanced morning peak « opposite » to pdf/pdfr mutants. To my knowledge, the alteration of PDF/PDFR suppresses the morning peak. I am not sure that an advance of the peak is « opposite » to its inhibition?

      Mutants with disruptions in CNMa or CNMaR display advanced morning activity, indicating an enhanced state. Mutants with disruptions in Pdf or Pdfr exhibit no morning anticipation, suggesting a promoting role of these genes in morning anticipation. Therefore, our revised version is: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-51)

      Point 2- Fig 1K-L: the authors should show the sleep phenotype of the homozygous nAChRbeta2 mutant (if not lethal) for a direct comparison with the FRT/FLP genotype and thus evaluate the efficiency of the system.

      We have incorporated sleep profiles of nAChRbeta2 mutant and W1118 into Fig 1K-L. nAChRbeta2 mutants (red) exhibited a sleep amount comparable to that of pan-neural nAChRbeta2 knockout flies (dark red), as shown below.

      Author response image 1.

      Point 3- Dh31-EGFP-FRT expression patterns look different in figS1 A (or fig1 H) and J. why that?

      We re-examined the original data. Both (with R57C10-GAL4 for Fig. S1A, right, S1J, left) are Dh31EGFP.FRT samples displayed below which demonstrated consistent primary expression subsets. Any observed disparities in region "e" could potentially be attributed to variations during dissection.

      Author response image 2.

      Point 4- The knockdown experiments with the elav-switch (RU486) system (fig S2) do not seem to be as efficient as the HS-FLP system (fig 1H-J). The conclusions on the efficiency should be toned down.

      We have revised accordingly: "Near Complete Disruption of Target Genes by GFPi and Flp-out Based cCCTomics" (Line 130): "Knocking out at the adult stage using either hsFLP driven Flp-out (Golic and Lindquist, 1989) (Fig. 1H-1J) or neural (elav-Switch) driven shRNAGFP (Nicholson et al., 2008; Osterwalder et al., 2001) (Fig. S2A-S2I), also resulted in the elimination of most, though not all, GFP signals." (Line 145-149)

      Point 5- Fig 2H-J: the LD behavioral phenotype of pdfr pan-neuronal cripsr does not seem to correspond to what is described in the literature for the pdfr mutant (han), see hyun et al 2005 (no morning anticipation and advanced evening peak). I understand that the activity index is lower than controls but fig2H shows a large anticipatory activity that seems really unusual, and no advanced evening peak is observed. I think that the authors should show the CRISPR flies and pdfr mutants together, to better compare the phenotypes.

      Thank you for pointing out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig. 2H-2I of the previous version) whose morning anticipation still exist (Fig, 2H of the previous manuscript), although the significant decrease of morning anticipation index (Fig 2I of the previous manuscript) and advanced evening activity are not as pronounced as observed in han5304 (Fig. 3C in Hyun et al., 2005).

      First, we have separated the activity plots of Fig. 2H of previous manuscript, as shown below. The activity from ZT18 to ZT24 shows a tendency of decreasing from ZT18 to ZT21 and a tendency of increasing from ZT21 to ZT24. The lowest activity before dawn during ZT18 to ZT24 shows at about ZT21, and the activity at ZT18 is comparable to the activity at ZT24. This is significantly different compared to the two control groups, whose activity tends to increase activity from ZT18 to ZT24 with an activity peak at ZT24.

      The activity from ZT6 to ZT12 increased much faster in Pdfr knockout flies and get to an activity plateau at about ZT11 compared to two control groups with a slower activity increasing from ZT6 to ZT12 with no activity plateau but an activity peak at ZT12.

      Author response image 3.

      Second, we have incorporated the phenotype of Pdfr mutants we previously generated (Pdfr-attpKO Deng et al., 2019) with Pdfr pan-neuronal knockout by Cas9.HC. This mutant lacks all seven transmembrane regions of Pdfr (a). The phenotypes are very similar between Pdfr-attpKO flies and Pdfr pan-neuronal knockout flies. In this experimental repeat, we found that a much more obvious advanced evening activity peak is observed both in pan-neuronal knockout flies and Pdfr-attpKO flies.

      To further analyze the phenotypes of Pdfr pan-neuronal knockout flies by Cas9.HC, we referred to the literature. The activity pattern at ZT18 to ZT24 (activity tends to decrease from ZT18 to ZT21 and tends to increase from ZT21 to ZT24, with the lowest activity before dawn occurring at about ZT21, and activity at ZT18 comparable to activity at ZT24) is also reported in Pdfr knockout flies such as Fig3C and 3H in Hyun et al., 2005, Fig 2B in Lear et al., 2009, Fig 3B in Zhang et al., 2010, Fig .5A in Guo et al., 2014, and Fig 5B in Goda et al., 2019. Additionally, the less pronounced advanced evening activity peak compared to han5304 (Fig. 3C in Hyun et al., 2005) is also reported in Fig. 2B in Lear et al., 2009, Fig. 3B in Zhang et al., 2010, and Fig. 5B in Goda et al., 2019. We consider that this difference is more likely to be caused by environmental conditions or recording strategies (DAM system vs. video tracing).

      Therefore, we revised the text to: “Pan-neuronal knockout of Pdfr resulted in a tendency towards advanced evening activity and weaker morning anticipation compared to control flies (Fig. 2H-2I), which is similar to Pdfr-attpKO flies. These phenotypes were not as pronounced as those reported previously, when han5304 mutants exhibited a more obvious advanced evening peak and no morning anticipation (Hyun et al., 2005)”.

      Author response image 4.

      Point 6-The authors should provide more information about the DD behavior (power is low, but how about the period of rhythmic flies, which is shortened in pdf (renn et al) and pdfr (hyun et al) mutants).

      We have incorporated period data into Fig. 2I. Indeed, conditional knock out of Pdfr by Cas9.HC driven by R57C10-GAL4 shortens the period length, as shown below (previous data), also in Fig. 2I of the revised version.

      In the revised Fig. 2I, we tested 45 Pdfr-attpKO flies during DD condition (3 out of 48 flies died during video tracing in DD condition), and only one fly was rhythmic. In contrast, 9 out of 48 Pdfr pan-neuronal knockout flies were rhythmic.

      Author response image 5.

      Point 7- P15 and fig6. The authors indicate that type II CNMa neurons do not show advanced morning activity as type I do, but Figs 6 I and K seem to show some advance although less important than type I. I am not sure that this supports the claim that type I is the main subset for the control of morning activity. This should be toned down.

      We have re-organized Fig. 6 and revised the summary of these results as: “However, Type II neurons-specific CNMa knockout (CNMa ∩ GMR91F02) showed weaker advanced morning activity without advanced morning peak (Fig. 6N), while Type I neurons-specific CNMa knockout did (Fig. 6J), indicating a possibility that these two type I CNMa neurons constitute the main functional subset regulating the morning anticipation activity of fruit fly”. (Line 400-405)

      Point 8- Figs 6M and N: is power determined from DD data? if yes, how about the period and arrhythmicity? Please also provide the LD activity profiles for the mutants and rescued pdfr genotypes.

      Yes, the power was determined from the DD data. In the new version of the manuscript, we have included the activity plots for the LD phase in supplementary Fig S13, as well as shown below (A, B), and the period and arrhythmicity data for the DD phase in Fig. 6S and Table S7. We have also refined the related description as follows: “Moreover, knocking out Pdfr by GMR51H05, GMR79A11 and CNMa GAL4, which cover type I CNMa neurons, decreased morning anticipation of flies (Fig. 6T, Fig. S13B). However, the decrease in morning anticipation observed in the Pdfr knockout by CNMa-GAL4 was not as pronounced as with the other two drivers. Because the presumptive main subset of functional CNMa is also PDFR-positive, there is a possibility that CNMa secretion is regulated by PDF/PDFR signal”. (Line 413-419)

      Author response image 6.

      Point 9- Fig 7: does CNMaR affect DD behavior? This should be tested.

      We analyzed the CNMaR-/- activity in the dark-dark condition over a span of six days. Results revealed a higher power in CNMaR mutants compared to control flies (Power: 93.5±41.9 (CNMaR-/-, n=48) vs 47.3±31.6 (w1118, n=47); Period: 23.7±0.3 h (CNMaR-/-, n=46) vs 23.7±0.3 h (w1118, n=47); arrhythmic rate 2/48 (CNMaR-/-) vs 0/47 (w1118)). Considering that mutating CNMa had no obvious effect on DD behavior, even if CNMaR affects DD behavior, it cannot be attributed to CNMa signal, we did not further repeat and analyze DD behavior of CNMaR mutant. We believe this raises another question beyond the scope of our current discussion.

      Reviewer #2 (Recommendations For The Authors):

      Point 1-One major concern is the apparent discrepancies in clock network gene expression using the Flp-Out and split-LexA approaches compared to what is known about the expression of several transmitter and peptide-related genes. For example, it is well established that the 5th-sLNv expresses CHAT (along with a single LNd), yet there appears to be no choline acetyltransferase (ChAT) signal in the 5th-sLNv as assayed by the Split-LexA approach (Fig. 4). This approach also suggests that DH31 is expressed in the s-LNvs, which, as one of the most intensely studied clock neuron are known to express PDF and sNPF, but not DH31. The results also suggest that the sLNvs express ChAT, which they do not. Remarkably PDF is not included in the expression analysis, this peptide is well known to be expressed in only two subgroups of clock neurons, and would therefore be an excellent test case for the expression analysis in Fig. 4. PDF should therefore be added to analysis shown in Fig. 4. Another discrepancy is PdfR, which split LexA suggests is expressed in the Large LNvs but not the small LNvs, the opposite of what has been shown using both reporter expression and physiology. The authors do acknowledge that discrepancies exist between their data and previous work on expression within the clock network (lines 237 and 238). However, the extent of these discrepancies is not made clear and calls into question the accuracy of Flp-Out and Split LexA approaches.

      The concerns mentioned above are:

      (1) sLNvs express PDF and sNPF but not Dh31;

      (2) ChAT presents in 5th-sLNv and one LNd but not in other sLNvs;

      (3) PDFR presents in sLNvs but not l-LNvs.

      (4) PDF is not included in the analysis.

      To verify the accuracy of these intersection analyses, all related to PDF positive neurons (except 5th-sLNv and LNds), we stained PDF and examined the co-localization between PDF-positive LNvs and the respective drivers ChAT-KI-LexA, Pdfr-KI -LexA, Dh31-KI -LexA, and Pdf-KI -LexA.

      First, Dh31-KI-LexA labeled four s-LNvs, as shown below (also in Fig. S9A). Therefore, the results of the intersection analysis of Dh31-KI-LexA with Clk856-GAL4 are correct. The difference in the results compared to previous literature is attributed to Dh31-KI-LexA labels different neurons than the previous driver or antibody.

      Second, no s-LNv was labeled by ChAT-KI -LexA as shown below. We rechecked our intersection data and found that we analyzed 10 brains of ChAT-KI-LexA∩Clk856-GAL4 while only two brains showed sLNvs positively. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Third, one l-LNv and at least two s-LNvs were labeled by Pdfr-KI-LexA, as shown below (also in Fig. S9B). Fourth, Pdf-KI-LexA labels all PDF-positive neurons, but the intersection analysis by Pdf-KI-LexA and Clk856-GAL4 only showed scattered signals, as shown below (D, also in Fig. S9C). For these cases, we found some positive signals expected but not observed in our dissection. The possible reason could be the inefficiency of LexAop-FRT-myr::GFP driven by LexA. Therefore, our intersection results must miss some positive signals.

      Author response image 7.

      Finally, we revised the text to (Line 286-317):

      To assess the accuracy of expression profiles using CCT drivers, we compared our dissection results with previous reports. Initially, we confirmed the expression of CCHa1 in two DN1s (Fujiwara et al., 2018), sNFP in four s-LNvs and two LNds(Johard et al., 2009), and Trissin in two LNds (Ma et al., 2021), aligning with previous findings. Additionally, we identified the expression of nAChRα1, nAChRα2, nAChRβ2, GABA-B-R2, CCHa1-R, and Dh31-R in all or subsets of LNvs, consistent with suggestions from studies using ligands or agonists in LNvs (Duhart et al., 2020; Fujiwara et al., 2018; Lelito and Shafer, 2012; Shafer et al., 2008) (Table S4).

      Regarding previously reported Nplp1 in two DN1as (Shafer et al., 2006), we found approximately five DN1s positive for Nplp-KI-LexA, indicating a broader expression than previously reported. A similar pattern emerged in our analysis of Dh31-KI-LexA, where four DN1s, four s-LNvs, and two LNds were identified, contrasting with the two DN1s found in immunocytochemical analysis (Goda et al., 2016). Colocalization analysis of Dh31-KI-LexA and anti-PDF revealed labeling of all PDF-positive s-LNvs but not l-LNvs (Fig S9A), suggesting that the differences may arise from the broader labeling of 3' end knock-in LexA drivers or the amplitude effect of the binary expression system. The low protein levels might go undetected in immunocytochemical analysis. This aligns with transcriptome analysis findings showing Nplp1 positive in DN1as, a cluster of CNMa-positive DN1ps, and a cluster of DN3s (Ma et al., 2021), which is more consistent with our dissection.

      Despite the well-known expression of PDF in LNvs and PDFR in s-LNvs (Renn et al., 1999; Shafer et al., 2008), we did not observe stable positive signals for both in Flp-out intersection experiments, although both Pdf-KI-LexA and Pdfr-KI-LexA label LNvs as expected (Fig S9B-S9C). We also noted fewer positive neurons in certain clock neuron subsets compared to previous reports, such as NPF in three LNds and some LNvs (Erion et al., 2016; He et al., 2013; Hermann et al., 2012; Johard et al., 2009; Lee et al., 2006) and ChAT in four LNds and the 5th s-LNv (Johard et al., 2009; Duhart et al., 2020) (Table S4). We attribute this limitation to the inefficiency of LexAop-FRT-myr::GFP driven by LexA, acknowledging that our intersection results may miss some positive signals.

      Point 2-Related to this, the authors rather inaccurately suggest that the field's understanding of PdfR expression within the clock neuron network is "inconsistent" and "variable" (lines 368-377). This is not accurate. It is true that the first attempts to map PdfR expression with antisera and GAL4s were inaccurate. However, subsequent work by several groups has produced strong convergent evidence that with the exception of the l-LNvs after several days post-eclosion, PdfR is expressed in the Cryptochrome expressing a subset of the clock neuron network. This section of the study should be revised.

      We thank the reviewer for pointing this out. As we have already addressed and revised the related part in the RESULTS section (Line 308-317), we have now removed this part from the DISCUSSION section of the revised version.

      Point 3-One minor issue that would avoid unnecessary confusion by readers familiar with the circadian literature is the say that activity profiles are plotted in the study. The authors have centered their averaged activity profiles on the 12h of darkness. This is the opposite of the practice of the field, and it leads to some initial confusion in the examination of the morning and evening peak data. The authors may wish to avoid this by centering their activity plots on the 12h light phase, which would put the morning peak on the left and the evening peak on the right. This is the way the field is accustomed to examining locomotor activity profiles.

      The centering of averaged activity profiles on the 12 h of darkness is done to highlight the phenotype of advanced morning activity. To prevent any confusion among readers, we have included a sentence in the figure legend explaining the difference in our activity profiles compared to previous literatures: "Activity profiles were centered of the 12 h darkness in all figures with evening activity on the left and morning activity on the right, which is different from general circadian literatures. (Fig. 2H legend)" (Line 957-959))

      Point 4-The authors conclude that the loss of PDF and CNMa have opposite effects on the morning peak of locomotor activity (line 392). But they also acknowledge, briefly, that things are not that simple: loss of CNMa causes a phase advance, but loss of PDF causes a loss or reduction in the anticipatory peak. It is still significant to find a peptide transmitter with the clock neuron network that regulates morning activity, but the authors should revise their conclusion regarding the opposing actions of PDF and CNMa, which is not well supported by the data.

      We have revised the relevant parts.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Point 5-The authors should acknowledge, cite, and incorporate the substantive discussion of CNMa peptide and the DN1p neuronal class in Reinhard et al. 2022 (Front Physiol. 13: 886432).

      We have revised the text accordingly and cited this paper: “Type I with two neurons whose branches projecting to the anterior region, as in CNMa∩GMR51H05, CNMa∩Pdfr, and CNMa∩GMR79A11 (Fig. 6E, 5G, 6H), and type II with four neurons branching on the posterior side with few projections to the anterior region, as in CNMa∩GMR91F02 (Fig. 6F). These two types of DN1ps’ subsets were also reported and profound discussed previously (Lamaze et al., 2018; Reinhard et al., 2022)”. (Line 393-397)

      Reviewer #3 (Recommendations For The Authors):

      Point 1-Throughout the manuscript figure legends (axis, genotypes, etc) are too small to be appreciated. Fig. 1. Panel A. The labels are very difficult to read.

      We have attempted to enlarge the font as much as possible in the revised version.

      Point 2-Fig. 1. H-J Why is efficiency not mentioned in all the examples?

      In the revised manuscript, the results of Fig 1H-1J are discussed in the revised version (Line 145-147). The reason that we did not calculate the exact efficiency is that the GFP intensity is not stable enough which might change during dissection, mounting or intensity of laser in our experimental process. Therefore, in all results related to GFP signal (Fig. 1B-1J, Fig. S1, Fig. S2, Fig. 2B-2D), we relied on qualitative judgment rather than quantitative judgment, unless the GFP signal was easily quantifiable (such as in cases with limited cells or no GFP signal in the experimental group).

      Point 3-Fig. 1. Panel L, left (light phase): the statistical comparisons are not clearly indicated (the same happens in Figs 3Q and 3R).

      We have now re-arranged Fig. 1L and Fig. 3Q-3R to make the statistical comparisons clear in the new version.

      Point 4-Line 792. Could induced be introduced?

      Yes, we have now corrected this typo.

      Point 5-Fig. S1. Check labels for consistency. GMR57C10 Gal4 driver is most likely R57C10.

      We have now revised the labels (Fig. S1).

      Point 6-Fig. S2. If the experiments were repeated and several brains were observed, the authors should include the efficiency and the number of flies as reported in Fig. S1.

      We have now added the number of flies in Fig. S2 as reported in Fig. S1. As Response to Point 2 mentioned, due to the instability of the GFP signal, we are unable to provide a quantitative efficiency in this context.

      Point 7-Fig S4. The fig legend describes panels I-J which are not shown in the current version of the manuscript.

      We now have deleted them.

      Point 8-Fig 2I. Surprising values for morning anticipation indexes even for controls (0.5 would indicate ¨no anticipation¨; in controls, the expected values would be >>0.5, as most of the activity is concentrated right before the transition. Could the authors explain this unexpected result?

      We have revised the description of the calculation in the methods section (Line 612). After calculating the ratio of the last three hours of activity to the total six hours of activity, the results were further subtracted by 0.5. Therefore, the index should be ≤0.5. When the index is equal to 0, it indicates no morning anticipation.

      Point 9-Fig 2K/L. The authors mention that not all genes are effectively knocked out with their strategy. Could this be accounted for the specific KD strategy, its duration, or the promotor strength? It is surprising no explanation is provided in the text (page 9 line 179).

      In our pursuit of establishing a broadly effective method for gene editing, Fig. 2H-2L and Fig. 2D revealed that previous attempts have fallen short of achieving this objective. The observed inefficiency may be attributed to the intensity of the promoter, resulting in inadequate expression. Alternatively, the insufficient duration of the operation may also contribute to the lack of success. However, in the context of sleep and rhythm research applications, the age of the fruit fly tests is typically fixed, limiting the potential to enhance efficiency by extending the manipulation time. Moreover, increasing the expression level may pose challenges related to cytotoxicity, as reported in previous studies (Port et al., 2014). We refrain from offering specific explanations, as we lack a definitive plan and cannot provide additional robust evidence to support the above speculations. Consequently, in our ongoing efforts, we aim to enhance the efficiency of the tool system while operating within the current constraints.

      Point 10-Page 9, line 179. Can the authors include a brief description of the reason for the different modifications? Only one was referenced.

      We have revised related part in the manuscript (Line 223-231):

      Cas9.M9: We fused a chromatin-modulating peptide (Ding et al., 2019), HMGN1 183 (High mobility group nucleosome binding domain 1), at the N-terminus of Cas9 and HMGB1 184 (High mobility group protein B1) at its C-terminus with GGSGP linker, termed Cas9.M9.

      Cas9.M6: We also obtained a modified Cas9.M6 with HMGN1 at the N-terminus and an undefined peptide (UDP) at the C-terminus. (NOTE:UDP was gained by accident)

      Cas9.M0: We replaced the STARD linker between Cas9 and NLS in Cas9.HC with GGSGP the linker (Zhao et al., 2016), termed Cas9.M0

      Point 11-The authors tested the impact of KO nAChR2 across the different versions of conditional disruption (Fig 1K-L, Fig 2L, Fig 3R). It is surprising they observe a difference in daytime sleep upon knocking down with Cas9.HC (2L) but not with Cas9.M9 (3R) and the reverse is seen for night-time sleep. Could the authors provide an explanation? Efficiency is not the issue at stake, is it?

      In Fig. 2K, the day sleep of flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; UAS-Cas9/+) was significantly decreased compared to flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; +/+), but not when compared to flies (R57C10-GAL4/+; UAS-Cas9/+). Our criterion for asserting a difference is that the experimental group must show a significant distinction from both control groups. Therefore, we concluded that there was no significant difference between the experimental group and the control groups in Fig. 2K.

      Point 12-Fig. 4. Which of the two strategies described in A-B was employed to assemble the expression profile of CCT genes in clock neurons shown in C? This information should be part of the fig legend.

      We have now revised the legend as follows: “(A-B) Schematic of intersection strategies used in Clk856 labelled clock neurons dissection, Flp-out strategy (A) and split-LexA strategy (B). The exact strategy used for each gene is annotated in Table S5.”

      Point 13-Similarly, how many brains were analyzed to give rise to the table shown in C?

      We have now revised the legend of Table S4 to address this concern. As indicated in: “The largest N# for each gene in Table S4 is the brain number analyzed for each gene”.

      Point 14-Finally, the sentence ¨The figure is...¨ requires revision.

      We have now revised it: “The exact cell number for each subset is annotated in Table S4”.

      Point 15-Legend to Table S3. The authors have done an incredible job testing many gRNAs for each gene potentially relevant for communication. However, there is very little information to make the most out of it; for instance, the legend does not inform why many of the targeted genes do not appear to have been tested any further. It would be useful to the reader to discern whether despite being the 3 most efficient gRNAs, they were still not effective in targeting the gene of interest, or whether they showed off-targets, or it was simply a matter of testing the educated guesses. This information would be invaluable for the reader.

      First, we designed and generated transgenic UAS-sgRNA fly lines for all these sgRNAs. We randomly selected 14 receptor genes, known for their difficulty in editing based on our experience, to assess the efficiency of our strategy, as depicted in Fig. 3M-3P, Fig. S5, and Fig. S6. We believe these results are representative and indicative of the efficiency of sgRNAs designed using our process and applied with the modified Cas9.

      Secondly, we acknowledge your valid concern. While we selected sgRNAs with no predicted off-target effects through various prediction models (outlined in the Methods under C-cCCTomics sgRNA design), we did not conduct whole-genome sequencing. Consequently, we can only assert that the off-target possibility is relatively low. To address potential misleading effects arising from off-target concerns, it is essential to validate these results through mutants, RNAi, or alternative UAS-sgRNAs targeting the same gene.

      Point 16-Table S4. Some of the data presented derives from observations made in 1-2 brains for a specific cluster; isn´t it too little to base a decision on whether a certain gene is (or not) expressed? It is surprising since the same CCT line was observed/analysed in more brains for other clusters. Can the authors explain the rationale?

      The N# number represents the GFP positive number, and we have revised the legend of Table S4. The largest N# number denotes the total number of brains analyzed for a specific CCT line. It's possible that, due to variations in our dissection or mounting process, some clusters were only observed in 1-2 brains out of the total brains analyzed. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Point 17-The paragraph describing this data in the results section needs revising (lines 233-243).

      We have now revised this. (Line 286-317)

      Point 18-While it is customary for authors to attempt to improve the description of the activity patterns by introducing new parameters (i.e. MAPI and EAPI, lines 253-258) it would be interesting to understand the difference between the proposed method and the one already in use (which compares the same parameter, i.e., the slope (defined as ¨the slope of the best-fitting linear regression line over a period of 6 h prior to the transition¨, i.e., Lamaze et al. 2020 and many others). Is there a need to introduce yet another one?

      This approach is necessary. The slope defined by Lamaze et al. utilizes data from only 2 time points, which may not accurately capture the pattern within a period before light on or off. Linear regression is not well-suited for a single fly due to the high variability in activity at each time point, making it challenging to fit the model at the individual level. The parameters we have introduced (MAPI and EAPI) in this paper are concise and can be applied at the individual level, effectively reflecting the morning or evening anticipation characteristics of each fly.

      As an alternative, the activity plot of a certain fly line could be represented by an average of all flies' activity in one experiment. This would make linear regression easier to fit. However, several independent experiments are required for statistical robustness, necessitating the inclusion of hundreds of flies for each strain in a single analysis.

      Point 19-In general, the legends of supplementary figures are a bit too brief. S7 and S8: it is not clear which of the two intersectional strategies were used (it would benefit whoever is interested in replicating the experiments). Legend to Fig S8 should read ¨similar to Fig S7¨.

      We have now revised the legend and included “The exact strategy used for each gene is annotated in Table S5” in the legend.

      Point 20-The legend in Table S6 should clearly state the genotypes examined. What does the marking in bold refer to?

      We have now revised annotation of Table S6. Marking in bold refer to results out of one SD compared to control group.

      Point 21-Line 314. The sentence needs revision.

      We have revised these sentences.

      Point 22-Line 391 (and also in the results section). The authors attempt to describe the CNMa phenotype as the opposite of pdf/pdfr mutant phenotypes. However, no morning anticipation/advanced morning anticipation are not necessarily opposite phenotypes.

      We have revised related description.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Reference

      Deng, B., Li, Q., Liu, X., Cao, Y., Li, B., Qian, Y., Xu, R., Mao, R., Zhou, E., Zhang, W., et al. (2019). Chemoconnectomics: mapping chemical transmission in Drosophila. Neuron 101, 876-893.e874.

      Ding, X., Seebeck, T., Feng, Y., Jiang, Y., Davis, G.D., and Chen, F. (2019). Improving CRISPR-Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. Crispr j 2, 51-63.

      Duhart, J.M., Herrero, A., de la Cruz, G., Ispizua, J.I., Pírez, N., and Ceriani, M.F. (2020). Circadian Structural Plasticity Drives Remodeling of E Cell Output. Curr Biol 30, 5040-5048.e5045.

      Erion, R., King, A.N., Wu, G., Hogenesch, J.B., and Sehgal, A. (2016). Neural clocks and Neuropeptide F/Y regulate circadian gene expression in a peripheral metabolic tissue. eLife 5, e13552.

      Fujiwara, Y., Hermann-Luibl, C., Katsura, M., Sekiguchi, M., Ida, T., Helfrich-Förster, C., and Yoshii, T. (2018). The CCHamide1 neuropeptide expressed in the anterior dorsal neuron 1 conveys a circadian signal to the ventral lateral neurons in Drosophila melanogaster. Front Physiol 9, 1276.

      Goda, T., Tang, X., Umezaki, Y., Chu, M.L., Kunst, M., Nitabach, M.N.N., and Hamada, F.N. (2016). Drosophila DH31 neuropeptide and PDF receptor regulate night-onset temperature preference. J Neurosci 36, 11739-11754.

      Goda, T., Umezaki, Y., Alwattari, F., Seo, H.W., and Hamada, F.N. (2019). Neuropeptides PDF and DH31 hierarchically regulate free-running rhythmicity in Drosophila circadian locomotor activity. Sci Rep 9, 838.

      Guo, F., Cerullo, I., Chen, X., and Rosbash, M. (2014). PDF neuron firing phase-shifts key circadian activity neurons in Drosophila. Elife 3.

      He, C., Cong, X., Zhang, R., Wu, D., An, C., and Zhao, Z. (2013). Regulation of circadian locomotor rhythm by neuropeptide Y-like system in Drosophila melanogaster. Insect Mol Biol 22, 376-388.

      Hermann, C., Yoshii, T., Dusik, V., and Helfrich-Förster, C. (2012). Neuropeptide F immunoreactive clock neurons modify evening locomotor activity and free-running period in Drosophila melanogaster. J Comp Neurol 520, 970-987.

      Hyun, S., Lee, Y., Hong, S.T., Bang, S., Paik, D., Kang, J., Shin, J., Lee, J., Jeon, K., Hwang, S., et al. (2005). Drosophila GPCR Han is a receptor for the circadian clock neuropeptide PDF. Neuron 48, 267-278.

      Johard, H.A., Yoishii, T., Dircksen, H., Cusumano, P., Rouyer, F., Helfrich-Förster, C., and Nässel, D.R. (2009). Peptidergic clock neurons in Drosophila: ion transport peptide and short neuropeptide F in subsets of dorsal and ventral lateral neurons. J Comp Neurol 516, 59-73.

      Lamaze, A., Krätschmer, P., Chen, K.F., Lowe, S., and Jepson, J.E.C. (2018). A Wake-Promoting Circadian Output Circuit in Drosophila. Curr Biol 28, 3098-3105.e3093.

      Lear, B.C., Zhang, L., and Allada, R. (2009). The neuropeptide PDF acts directly on evening pacemaker neurons to regulate multiple features of circadian behavior. PLoS Biol 7, e1000154.

      Lee, G., Bahn, J.H., and Park, J.H. (2006). Sex- and clock-controlled expression of the neuropeptide F gene in Drosophila. 103, 12580-12585.

      Lelito, K.R., and Shafer, O.T. (2012). Reciprocal cholinergic and GABAergic modulation of the small ventrolateral pacemaker neurons of Drosophila's circadian clock neuron network. J Neurophysiol 107, 2096-2108.

      Ma, D., Przybylski, D., Abruzzi, K.C., Schlichting, M., Li, Q., Long, X., and Rosbash, M. (2021). A transcriptomic taxonomy of Drosophila circadian neurons around the clock. Elife 10.

      Port, F., Chen, H.M., Lee, T., and Bullock, S.L. (2014). Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc Natl Acad Sci USA 111, E2967-2976.

      Reinhard, N., Schubert, F.K., Bertolini, E., Hagedorn, N., Manoli, G., Sekiguchi, M., Yoshii, T., Rieger, D., and Helfrich-Förster, C. (2022). The Neuronal Circuit of the Dorsal Circadian Clock Neurons in Drosophila melanogaster. Front Physiol 13, 886432.

      Renn, S.C., Park, J.H., Rosbash, M., Hall, J.C., and Taghert, P.H. (1999). A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell 99, 791-802.

      Shafer, O.T., Helfrich-Förster, C., Renn, S.C., and Taghert, P.H. (2006). Reevaluation of Drosophila melanogaster's neuronal circadian pacemakers reveals new neuronal classes. J Comp Neurol 498, 180-193.

      Shafer, O.T., Kim, D.J., Dunbar-Yaffe, R., Nikolaev, V.O., Lohse, M.J., and Taghert, P.H. (2008). Widespread receptivity to neuropeptide PDF throughout the neuronal circadian clock network of Drosophila revealed by real-time cyclic AMP imaging. Neuron 58, 223-237.

      Zhang, L., Chung, B.Y., Lear, B.C., Kilman, V.L., Liu, Y., Mahesh, G., Meissner, R.A., Hardin, P.E., and Allada, R. (2010). DN1(p) circadian neurons coordinate acute light and PDF inputs to produce robust daily behavior in Drosophila. Curr Biol 20, 591-599.

      Zhao, P., Zhang, Z., Lv, X., Zhao, X., Suehiro, Y., Jiang, Y., Wang, X., Mitani, S., Gong, H., and Xue, D. (2016). One-step homozygosity in precise gene editing by an improved CRISPR/Cas9 system. Cell Res 26, 633-636.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study provides an unprecedented understanding of the roles of different combinations of NaV channel isoforms in nociceptors' excitability, with relevance for the design of better strategies targeting NaV channels to treat pain. Although the experimental combination of electrophysiological, modeling, imaging, molecular biology, and behavioral data is convincing and supports the major claims of the work, some conclusions need to be strengthened by further evidence or discussion. The work may be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Xie, Prescott, and colleagues have reevaluated the role of Nav1.7 in nociceptive sensory neuron excitability. They find that nociceptors can make use of different sodium channel subtypes to reach equivalent excitability. The existence of this degeneracy is critical to understanding neuronal physiology under normal and pathological conditions and could explain why Nav subtype-selective drugs have failed in clinical trials. More concretely, nociceptor repetitive spiking relies on Nav1.8 at DIV0 (and probably under normal conditions in vivo), but on Nav1.7 and Nav1.3 at DIV4-7 (and after inflammation in vivo).

      The conclusions of this paper are mostly well supported by data, and these findings should be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Strengths:

      (1.1) The authors have employed elegant electrophysiology experiments (including specific pharmacology and dynamic clamp) and computational simulations to study the excitability of a subpopulation of DRGs that would very likely match with nociceptors (they take advantage of using transgenic mice to detect Nav1.8-expressing neurons). They make a strong point showing the degeneracy that occurs at the ion channel expression level in nociceptors, adding this new data to previous observations in other neuronal types. They also demonstrate that the different Nav subtypes functionally overlap and are able to interchange their "typical" roles in action potential generation. As Xie, Prescott, and colleagues argue, the functional implications of the degenerate character of nociceptive sensory neuron excitability need to be seriously taken into account regarding drug development and clinical trials with Nav subtype-selective inhibitors.

      Weaknesses:

      (1.2) The next comments are minor criticisms, as the major conclusions of the paper are well substantiated. Most of the results presented in the article have been obtained from experiments with DRG neuron cultures, and surely there is a greater degree of complexity and heterogeneity about the degeneracy of nociceptors excitability in the "in vivo" condition. Indeed, the authors show in Figures 7 and 8 data that support their hypothesis and an increased Nav1.7's influence on nociceptor excitability after inflammation, but also a higher variability in the nociceptors spiking responses. On the other hand, DRG neurons targeted in this study (YFP (+) after crossing with Nav1.8-Cre mice) are >90% nociceptors, but not all nociceptors express Nav1.8 in vivo. As shown by Li et al., 2016 ("Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity"), there is a high heterogeneity of neuron subtypes within sensory neurons. Therefore, some caution should be taken when translating the results obtained with the DRG neuron cultures to the more complex "in vivo" panorama.

      We agree that most but not all Nav1.8+ DRG cells are nociceptors and that not all nociceptors express Nav1.8. We targeted small neurons that also express (or at some point expressed) Nav1.8, thus excluding larger neurons that express Nav1.8. This allowed us to hone in on a relatively homogeneous set of neurons, which is crucial when testing different neurons to compare between conditions (as opposed to testing longitudinally in the same neuron, which is not feasible). We expect all neurons are degenerate but likely on the basis of different ion channel combinations. Indeed, even within small Nav1.8+ neurons, other channels that we did not consider likely contribute to the degenerate regulation (as now better reflected in the revised Discussion).

      That said, there are multiple sources of heterogeneity. We suspect that heterogeneity is more increased after inflammation than after axotomy because all DRG neurons experience axotomy when cultured whereas neurons experience inflammation differently in vivo depending on whether their axon innervates the inflamed area (now explained on lines 214-215). This is not so much about whether the insult occurs in vivo or in vitro, but about how homogeneously neurons are affected by the insult. Granted, neurons are indeed more likely to be heterogeneously affected in vivo since conditions are more complex. But our goal in testing PF-71 in behavioral tests (Fig. 8) was to show that changes observed in nociceptor excitability in Figure 7, despite heterogeneity, were predictive of changes in drug efficacy. In short, we establish Nav interchangeability by comparing neurons in culture (Figs 1-6), but we then show that similar Nav shifts can develop in vivo (Fig 7) with implications for drug efficacy (Fig 8). Such results should alert readers to the importance of degeneracy for drug efficacy (which is our main goal) even without a complete picture of nociceptor degeneracy or DRG neuron heterogeneity. Additions to the Discussion (lines 248-259, 304-308) are intended to highlight these considerations.

      (1.3) Although the authors have focused their attention on Nav channels, it should be noted that degeneracy concerning other ion channels (such as potassium ion channels) could also impact the nociceptor excitability. The action potential AHP in Figure 1, panel A is very different comparing the DIV0 (blue) and DIV4-7 examples. Indeed, the conductance density values for the AHP current are higher at DIV0 than at DIV7 in the computational model (supplementary table 5). The role of other ion channels in order to obtain equivalent excitability should not be underestimated.

      We completely agree. We focused on Nav channels because of our initial observation with TTX and because of industry’s efforts to develop Nav subtype-selective inhibitors, whose likelihood of success is affected by the changes we report. But other channels are presumably changing, especially given observed changes in the AHP shape (now mentioned on lines 304-308). Investigation should be expanded to include these other channels in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The authors have noted in preliminary work that tetrodotoxin (TTX), which inhibits NaV1.7 and several other TTX-sensitive sodium channels, has differential effects on nociceptors, dramatically reducing their excitability under certain conditions but not under others. Partly because of this coincidental observation, the aim of the present work was to re-examine or characterize the role of NaV1.7 in nociceptor excitability and its effects on drug efficacy. The manuscript demonstrates that a NaV1.7-selective inhibitor produces analgesia only when nociceptor excitability is based on NaV1.7. More generally and comprehensively, the results show that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes (NaV 1.3/1.7 and 1.8). This can cause widespread changes in the role of a particular subtype over time. The degenerate nature of nociceptor excitability shows functional implications that make the assignment of pathological changes to a particular NaV subtype difficult or even impossible.

      Thus, the analgesic efficacy of NaV1.7- or NaV1.8-selective agents depends essentially on which NaV subtype controls excitability at a given time point. These results explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      Strengths:

      (2.1) The above results are clearly and impressively supported by the experiments and data shown. All methods are described in detail, presumably allow good reproducibility, and were suitable to address the corresponding question. The only exception is the description of the computer model, which should be described in more detail.

      We failed to report basic information such as the software, integration method and time step in the original text. This information is now provided on lines 476-477. Notably, the full code is available on ModelDB plus all equations including the values for all gating parameters are provided in Supplementary Table 5 and values for maximal conductance densities for DIV0 and DIV7 models are provided in Supplementary Table 6. Changes in conductance densities to simulate different pharmacological conditions are reported in the relevant figure legends (now shown in red). We did not include model details in the main text to avoid disrupting the flow of the presentation, but all the model details are reported in the Methods, tables and/or figure legends.

      (2.2) The results showing that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and expression of different NaV subtypes are of great importance in the fields of basic and clinical pain research and sodium channel physiology and pharmacology, but also for a broad readership and community. The degenerate nature of nociceptor excitability, which is clearly shown and well supported by data has large functional implications. The results are of great importance because they may explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      In summary, the authors achieved their overall aim to enlighten the role of NaV1.7 in nociceptor excitability and the effects on drug efficacy. The data support the conclusions, although the clinical implications could be highlighted in a more detailed manner.

      Weaknesses:

      As mentioned before, the results that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes are impressive. However, there is some "gap" between the DRG culture experiments and acutely dissociated DRGs from mice after CFA injection. In the extensive experiments with cultured DRG neurons, different time points after dissociation were compared. Although it would have been difficult for functional testing to examine additional time points (besides DIV0 and DIV47), at least mRNA and protein levels should have been determined at additional time points (DIV) to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly.

      Characterizing the time course of NaV expression changes is worthwhile but, insofar as such details are not necessary to establish that excitability is degenerate, it was not include in the current study. Furthermore, since mRNA levels do not parallel the functional changes in Nav1.7 (Figure 6A), we do not think it would be helpful to measure mRNA levels at intermediate time points. Measuring protein levels would be more informative, however, as now explained on lines 362-369, neurons were recorded at intermediate time points in initial experiments and showed a lot of variability. Methods that could track fluorescently-tagged NaV channels longitudinally (i.e. at different time points in the same cell) would be well suited for this sort of characterization, but will invariably lead to more questions about membrane trafficking, phosphorylation, etc. We agree that a thorough characterization would be interesting but we think it is best left for a future study.

      It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV47) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. This would better link the following data demonstrating that in acutely dissociated nociceptors after CFA injection, the inflammationinduced increase in NaV1.7 membrane expression enhances the effect of (or more neurons respond to) the NaV1.7 inhibitor PF-71, whereas fewer CFA neurons respond to the NaV1.8 inhibitor PF-24.

      These are some of the many good questions that emerge from our results. We are not particularly keen to investigate what happens over several days in culture, since this is not so clinically relevant, but it would be interesting to compare changes induced by nerve injury in vivo (which usually involves neuroinflammatory changes) and changes induced by inflammation. Many previous studies have touched on such issues but we are cautious about interpreting transcriptional changes, and of course all of these changes need to be considered in the context of cellular heterogeneity. It would be interesting to decipher if changes in NaV1.7 and NaV1.8 are directly linked so that an increase in one triggers a decrease in the other, and vice versa. But of course many other channels are also likely to change (as discussed above), and they too warrant attention, which makes the problem quite difficult. We look forward to tackling this in future work.

      The results shown explain, at least in part, the poor clinical outcomes with the use of subtypeselective NaV inhibitors and therefore have important implications for the future development of Nav-selective analgesics. However, this point, which is also evident from the title of the manuscript, is discussed only superficially with respect to clinical outcomes. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how?

      We wish to avoid speculating on which particular clinical results are better explained because our study was not designed for that. Instead, our take-home message (which is well supported; see Discussion on lines 309-321) is that NaV1.7-selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. At the end of the results (line 235), which is, we think, what prompted the reviewer’s comment, we point to the Discussion. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is difficult. We certainly don’t have all the answers, but we hope our results will point readers in a new direction to help answer such questions.

      Another point directly related to the previous one, which should at least be discussed, is that all the data are from rodents, or in this case from mice, and this should explain the clinical data in humans. Even if "impediment to translation" is briefly mentioned in a slightly different context, one could (as mentioned above) discuss in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans.

      We are not aware of human data that speak directly to nociceptor degeneracy but degeneracy has been observed in diverse species; if anything, human neurons are probably even more degenerate based on progressive expansion of ion channel types, splice variants, etc. over evolution. Of course species differences extend beyond degeneracy and are always a concern for translation, because of a species difference in the drug target itself or because preclinical pain testing fails to capture the most clinically important aspects of pain (which we mention on line 35). Line 39 now reiterates that these explanations for translational difficulties are not mutually exclusive, but that degeneracy deserves greater consideration that is has hitherto received. Indeed, throughout our paper we imply that degeneracy may contribute to the clinical failure of Nav subtype-specific drugs, but those failures are certainly not evidence of degeneracy. In the Discussion (line 320-321), we now cite a recent review article on degeneracy in the context of epilepsy, and point out how parallels might help inform pain research. We wish we had a more direct answer to the reviewer’s request; in the absence of this, we hope our results motivate readers to seek out these answers in future research.

      Although speculative, it would be interesting for readers to know whether a treatment regimen based on "time since injury" with NaV1.7 and NaV1.8 inhibitors might offer benefits. Based on the data, could one hypothesize that NaV1.7 inhibitors are more likely to benefit (albeit in the short term) in patients with neuropathic pain with better patient selection (e.g., defined interval between injury and treatment)?

      We like that our data prompt this sort of prediction. However, this is potentially complicated since the injury may be subtle, which is to say that the exact timing may not be known. There are scenarios (e.g. postoperative pain) where the timing of the insult is known, but in other cases (e.g. diabetic neuropathy) the disease process is quite insidious, and different neurons might have progressed through different stages depending on how they were exposed to the insult. Our own experiments with CFA are a case in point. Notwithstanding the potential difficulties about gauging the time course, any way of predicting which Nav subtype is dominant could help more strategically choose which drug to use.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors used patch-clamp to characterize the implication of various voltagegated Na+ channels in the firing properties of mouse nociceptive sensory neurons. They report that depending on the culture conditions NaV1.3, NaV1.7, and NaV1.8 have distinct contributions to action potential firing and that similar firing patterns can result from distinct relative roles of these channels. The findings may be relevant for the design of better strategies targeting NaV channels to treat pain.

      Strengths:

      The paper addresses the important issue of understanding, from an interesting perspective, the lack of success of therapeutic strategies targeting NaV channels in the context of pain. Specifically, the authors test the hypothesis that different NaV channels contribute in a plastic manner to action potential firing, which may be the reason why it is difficult to target pain by inhibiting these channels. The experiments seem to have been properly performed and most conclusions are justified. The paper is concisely written and easy to follow.

      Weaknesses:

      (1) The most critical issue I find in the manuscript is the claim that different combinations of NaV channels result in equivalent excitability. For example, in the Abstract it is stated that: "...we show that nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8". The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same. I think that the culprit of this issue is that the authors reach their conclusion from the comparison of the (average) firing rate determined over 1 s current stimulation in distinct conditions. However, this is not the only parameter that determines how sensory neurons convey information. For instance, the time dependence of the instantaneous frequency, the actual firing pattern, may be important too. Moreover, the use of 1 s of current stimulation might not be sufficient to characterize the firing pattern if one wants to obtain conclusions that could translate to clinical settings (i.e., sustained pain). A neuron in which NaV1.7 is the main contributor is expected to have a damping firing pattern due to cumulative channel inactivation, whereas another depending mainly on NaV1.8 is expected to display more sustained firing. This is actually seen in the results of the modelling.

      This concern seems to boil down to how equivalent is equivalent? The spike shape or the full inputoutput curve for a DIV0 neuron (Nav1.8-dominant) is never equivalent to what’s seen in a DIV47 neuron (Nav1.7-dominant), but nor are any two DIV0 neurons strictly equivalent, and likewise for any two DIV4-7 neurons. Our point is that DIV0 and DIV4-7 neurons are a far more similar (less discriminable) in their excitability than expected from the qualitative difference in their TTX sensitivity (and from repeated claims in the literature that Nav1.7 is necessary for spike generation in nociceptors). Nav isoforms need not be identical to operate similarly; for instance, Nav1.8 tends to activate at “suprathreshold” voltages, but this depends on the value of threshold; if threshold increases, Nav1.8 can activate at subthreshold voltages (see Fig 5). We have modified lines 155- 175 to help clarify this.

      We completely agree that firing rate is not the only way to convey sensory information, and of course injecting current directly into the cell body via a patch pipette is not a natural stimulus. These are all factors to keep in mind when interpreting our data. Nonetheless, our data show that excitability is similar between DIV0 and DIV 4-7, so much so that data from any one neuron (without pharmacological tests or capacitance measurements) would likely not reveal if that cell is DIV0 or DIV4-7; this “indiscriminability” qualifies as “equivalent” for our purposes, and is consistent with phrasing used by other authors studying degeneracy. Notably, not every DIV4-7 neuron exhibits spike height attenuation (see Fig. 1A), likely because of concomitant changes in the AHP that were not captured in our computer model or directly tested in our experiments. This highlights that other channel changes may also contribute to degeneracy and the maintenance of repetitive spiking.

      (2) In Fig. 1, is 100 nM TTX sufficient to inhibit all TTX-sensitive NaV currents? More common in literature values to fully inhibit these currents are between 300 to 500 nM. The currents shown as TTX-sensitive in Fig. 1D look very strange (not like the ones at Baseline DIV4-7). It seems that 100 nM TTX was not enough, leading to an underestimation of the amplitude of the TTXsensitive currents.

      As now summarized in Supplementary Table 3 (which is newly added), 100 nM TTX is >20x the EC50 for Nav1.3 and Nav1.7 (but is still far below the EC50 for Nav1.8). Based on this, TTXsensitive channels are definitely blocked in our TTX experiments.

      (3) Page 8, the authors conclude that "Inflammation caused nociceptors to become much more variable in their reliance of specific NaV subtypes". However, how did the authors ensure that all neurons tested were affected by the CFA model? It could be that the heterogeneity in neuron properties results from distinct levels of effects of CFA.

      We agree with the reviewer. We also believe that variable exposure to CFA is the most likely explanation for the heightened variability in TTX-sensitivity reported in Figure 7 (now more clearly explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the CFA and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing), and that demonstration is true even if some neurons do not switch. Subsequent testing in Figure 8 shows that enough neurons switch to have a meaningful effect in terms of the behavioral pharmacology. So, notwithstanding tangential concerns, we think our CFA experiments succeeded in showing that Nav channels can switch in vivo and that this impacts drug efficacy.

      Recommendations for the authors:

      All reviewers agreed that these results are solid and interesting. However, the reviewers also raised several concerns that should be addressed by the authors to improve the strength of the evidence presented. Revisions considered to be essential include:

      (1) Discuss how degeneracy concerning other ion channels (such as potassium ion channels) could also impact nociceptor excitability (reviewer #1). Additionally, the translation of results from DRG neuron cultures to "in vivo" nociceptors should be better discussed.

      We have added a new paragraph to the Discussion (line 248-259) to remind readers that despite our focus on Nav channels, other ion channels likely also change (and that these changes involve diverse regulatory mechanisms that require further investigation). Likewise, despite our focus on the changes caused by culturing neurons, we remind readers that subtler, more clinically relevant in vivo perturbations can likewise cause a multitude of changes. We end that paragraph by emphasizing that although accounting for all the contributing components is required to fully understand a degenerate system, meaningful progress can be made by studying a subset of the components. We want to emphasize this because there is some middle ground between focusing on one component at a time (which is the norm) vs. trying to account for everything (which is an infeasible ideal). Additional text on lines 304-308 also addresses related points.

      (2) Discuss how different combinations of NaV channels result in equivalent excitability, in the context of the experimental conditions used (see main comment by reviewer #3). It should also be discussed in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans (reviewer #2).

      Regarding the first part of this comment, reviewer 3 wrote in the public review that “The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same.” Differences in gating properties are commonly used to argue that different Nav subtypes mediate different phases of the spike, for example, that Nav1.7 initiates the spike whereas Nav1.8 mediates subsequent depolarization because Nav1.7 and Nav1.8 activate at perithreshold and suprathrehold voltages, respectively (see lines 134-135, now shown in red). But such comparison is overly simplistic insofar as it neglects the context in which ion channels operate. For instance, if Nav1.7 is not expressed or fully inactivates, voltage threshold will be less negative, enabling Nav1.8 to contribute to spike initiation; in other words, previously “suprathreshold” voltages become “perithreshold”. Figure 5 is dedicated to explaining this context-sensitivity; specifically, we demonstrate with simulations how Nav1.8 takes over responsibility for initiating a spike when Na1.7 is absent or inactivated. Text on lines 155- 184 has been edited to help clarify this. Regarding the second part of this comment, we are not aware of any direct evidence from human sensory neurons that different sodium channels produce equivalent excitability, but that is certainly what we expect. We suggest that failure of Nav subtype-specific drugs is, at least in part, because of degeneracy, but such failures do not demonstrate degeneracy unless other contributing factors can be excluded (which they can’t). Recognizing degeneracy is difficult, and so variability that might be explained by degeneracy will go unexplained or attributed to other factors unless, by design or serendipity, experiments quantify the effects of degeneracy (as we have attempted to do here). We now cite a recent review article on degeneracy and epilepsy (line 320), which addresses relevant themes that might help inform pain research; for instance, most existing antiseizure medications act on multiple targets whereas more recently developed single-target drugs have proven largely ineffective. This is similar to but better documented than for analgesics. With this in mind, we revised the text to emphasize the circumstantial nature of existing evidence and the need to test more directly for degeneracy (lines 320-323).

      (3) Extend the discussion about the poor clinical outcomes with the use of subtype-selective NaV inhibitors. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how? (reviewer #2)

      As discussed above, we are cautious avoid speculating on which clinical results are attributable to degeneracy. Instead, our take-home message (see Discussion, lines 309-323) is that NaV1.7selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is not trivial. Interpreting clinical data is also complicated by the fact that we are either dealing with genetic mutations (with unclear compensatory changes) or pharmacological results (where NaV1.7-selective drugs have a multitude of problems that might contribute to their lack of efficacy, separate from effects of degeneracy). We have striven to contextualize our results (e.g. last paragraph of results, lines 222-235). We think this is the most we can reasonably say based on the limitations of existing clinical data.

      (4) Provide a clearer and more detailed description of the computational model (reviewers #2 and #3).

      We added important details on line 476-477 but, in our honest opinion, we think our computational model is thoroughly explained. The issue seems to boil down to whether details are included in the Results vs. being left for the Methods, tables and figure legends. We prefer the latter.

      (5) Better clarify the effects of the CFA model, to provide further evidence relating inflammation with nociceptors variability (reviewers #2 and #3)

      As explained in response to a specific point by reviewer #3, we believe that variable exposure to CFA explains the heightened variability in TTX-sensitivity reported in Figure 7 (now explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the inflammation and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing); that demonstration holds true even if some neurons do not switch. Subsequent testing (Fig 8) shows that enough neurons switch to drug efficacy assessed behaviorally. This is emphasized with new text on lines 225-227. Overall, we think our CFA experiments succeed in showing that Nav channels can switch in vivo and, despite variability, that this occurs in enough neurons to impact drug efficacy.

      (6) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews.

      Detailed responses are provided below for all feedback and changes to the text were made whenever necessary, as identified in our responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor points/recommendations:

      Protein synthesis inhibition by cercosporamide could be the direct cause of a smaller-thanexpected increase in Nav1.7 levels at DIV5. But for Nav1.8, there is a mitigation in the increased levels at DIV5, that only could be explained by several indirect mechanisms, including membrane trafficking and posttranslational modifications (phosphorylation, SUMOylation, etc.) on Nav1.8 or protein regulators of Nav1.8 channels. The authors suggest that "translational regulation is crucial", but also insinuate that other processes (membrane trafficking, etc.) could contribute to the observed outcome. It is difficult to assess the relative importance of these different explanations without knowing the exact mechanisms that are acting here.

      We agree. We relied on electrophysiology (and pharmacology) to measure functional changes, but we wanted to verify those data with another method. We expected mRNA levels to parallel the functional changes but, when that did not pan out, we proceeded to look at protein levels. Perhaps we should have stopped there, but by blocking protein translation, we show that there is not enough Nav1.7 protein already available that can be trafficked to the membrane. That does not explain why Nav1.8 levels drop. Our immunohistochemistry could not tease apart membrane expression from overall expression, which limits interpretation. We have enhanced the text to discuss this (lines 200-204), but further experiments are needed. Though admittedly incomplete, our initial finding help set the stage for future experiments on this matter.

      Page 15, typo: "contamination from genomic RNA" -> "contamination from genomic DNA" (appears twice).

      This has been corrected on lines 420 and 421.

      Page 17: I could not find the computer code at ModelDB (http://modeldb.yale.edu/267560). It seems to be an old web link. It should be available at some web repository.

      We confirmed that the link works. Entry is password-protected (password = excitability; see line 476). Password protection will be removed once the paper is officially published.

      Page 19, reference 36, typo: "Inhibitio of" -> "Inhibition of".

      This has been corrected (line 557).

      Page 33, typo: "are significantly larger than differences at DIV1" -> "are significantly larger than differences at DIV0".

      This has been corrected (line 796).

      Page 35, figure 6 legend. The number of experiments (n) is not indicated for panel C data.

      N = 3 is now reported (line 828).

      Reviewer #2 (Recommendations For The Authors):

      p. 3/4 and Data of Fig. 6: It should be commented on why days 1-3 were not investigated. An investigation of the time course (by higher frequency testing) would certainly have an added value because it would be possible to deduce whether the changes develop slowly and gradually, or whether the excitability induced by different NaVs changes suddenly. At least mRNA and protein levels should be determined at additional time points to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly. It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV4-7) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. Or is the latter question clear in the literature?

      We now explain (lines 362-369) that intermediate time points (DIV1-3) were tested in initial current clamp recordings. Those data showed that TTX-sensitivity stabilized by DIV4 and differed from the TTX-insensitivity observed at DIV0. TTX-sensitivity was mixed at DIV1-3 and crosscell variability complicated interpretation. Subsequent experiments were prioritized to clarify why NaV1.7 is not always critical for nociceptor excitability, contrary to past studies. Our efforts to measure mRNA and protein levels were primarily to validate our electrophysiological findings; we are also interested in deciphering the underlying regulatory processes but this is an entire study on its own. Unfortunately, the existing literature does not help or point to an explanation for the Nav1.7/1.8 shift we observed.

      Our evidence that mRNA levels do not parallel functional changes argues against pursuing transcriptional changes in Nav1.7, though transcriptional changes in other factors might be important. Interpretation of immuno quantification would be complicated by the high variability we observed with the physiology at intermediate time points and, furthermore, we cannot resolve surface expression from overall expression based on available antibodies. Methods conducive to longitudinal measurements would be more appropriate (as now mentioned on line 367-369). In short, a lot more work is required to understand the mechanisms involved in the switch, but we think the existing demonstration suffices to show that NaV1.7 and NaV1.8 protein levels vary, with crucial implications for which Nav subtype controls nociceptor excitability, and important implications for drug efficacy. Explaining why and how quickly those protein levels change will be no small feat is best left for a future study.

      p. 4 and following: In order to enable the interpretation of the used concentration of PF-24, PF71, and ICA, the respective IC50 should be indicated.

      A table (now Supplementary Table 3; line 861) has been added to report EC50 values for all drugs for blocking NaV1.7, NaV1.8 and NaV1.3. The concentrations we used are included on that table for easy comparison.

      p. 5, end of the middle paragraph: Here it should be briefly explained -for less familiar readers- why NaV1.1 cannot be causative (ICA inhibits NaV1.1 and 1.3).

      We now explain (lines 117-120) that NaV1.1 is expressed almost exclusively in medium-diameter (A-delta) neurons whereas NaV1.3 is known to be upregulated in small-diameter neurons, and so the effect we observe in small neurons is most likely via blockade NaV1.3.

      p. 6, lines 4/5: At least once it should read computer model instead of model.

      “Computer” has been added the first time we refer to DIV0 or DIV4-7 computer models (lines 138-139)

      p. 6: the difference between Fig. 4B and Fig. 4 - Figure suppl. 1 should be mentioned briefly.

      We now explain (lines 150-154) that Fig. 4B involves replacing a native channel with a different virtual channel (to demonstrate their interchangeability) whereas and Fig. 4 - Figure supplement 1 involves replacing a native channel with the equivalent virtual channel (as a positive control).

      p. 6/7: the text and the conclusions regarding Figure 5 are difficult to follow. Somewhat more detailed explanations of why which data demonstrate or prove something would be helpful.

      The text describing Figure 5 (lines 155-175) has been revised to provide more detail.

      p. 7, last sentence of the first paragraph: How is this supported by the data? Or should this sentence be better moved to the discussion?

      This sentence (now lines 182-184) is designed as a transition. The first half – “a subtype’s contribution shifts rapidly (because of channel inactivation)” – summarizes the immediately preceding data (Figure 5). The second half – “or slowly (because of [changes in conductance density])” – introduces the next section. The text show in square brackets has been revised. We hope this will be clearer based on revisions to the associated text.

      p. 7, second paragraph, line 3: Please delete one "at both".

      Corrected

      p. 7, second paragraph: Please explain why different time points (DIV4-7, DIV5, or DIV7) were used or studied.

      Initial electrophysiological experiments determined that TTX sensitivity stabilized by DIV 4 (see response to opening point) and we did not maintain neurons longer than 7 days, and so neurons recorded between DIV4 and 7 were pooled. If non-electrophysiological tests were conducted on a specific day within that range, we report the specific day, but any day within the DIV4-7 range is expected to give comparable results. This is now explained on lines 365-367.

      p. 8: the text regarding Fig. 7 should also include the important data (e.g. percentage of neurons showing repetitive spinking) mentioned in the legend.

      This text (lines 216-220) has been revised to include the proportion of neurons converted by PF71 and PF-24 and the associated statistical results.

      Fig. 1: third panel (TTX-sensitive current...) of D & Fig. 2 subpanel of A (Nav1.8 current...). These panels should be explained or mentioned in the text and/or legends.

      We now explain in the figure legends (lines 708-710; 714-715; 736-738) how those currents are found through subtraction.

      Fig. 2 - figure supplement 2. One might consider taking Panel A to Fig. 2 so that the comparison to DIV0 is apparent without switching to Suppl. Figs.

      We left this unchanged so that Figures 2 and 3 are equivalently organized, with negative control data left to the supplemental figures. Elife formatting makes it easy to reach the supplementary figure from the main figure, so we hope this won’t be an impediment to readers.

      Fig. 6 C, middle graph (graph of Nav1.7): Please re-check, whether DIV5 none vs. 24 h and none vs. 120 h are really significantly different with such a low p-value.

      We re-checked the statistics and the difference pointed out by the reviewer is significant at p=0.007. We mistakenly reported p<0.001 for all comparisons, and so this p value has been corrected; all the other p values are indeed <0.001. Notably, the data are summarized as median ± quartile because of their non-Gaussian distribution; this is now explained on line 827 (as a reminder to the statement on lines 461-462). Quartiles are more comparable to SD than to SEM (in that quartiles and SD represent the distribution rather than confidence in estimating the mean, like SEM), and so medians can differ very significantly even if quartiles overlap, as in this case.

      Reviewer #3 (Recommendations For The Authors):

      (1) A critical issue in the manuscript is the use of teleological language. It is likely that this is not the intention, but careful revision of the language should be done to avoid the use of expressions that confer purpose to a biological process. Please, find below a list of statements that I consider require correction.

      • In the Abstract, the first sentence: "Nociceptive sensory neurons convey pain signals to the CNS using action potentials". Neurons do not really "use" action potentials, they have no will or purpose to do so. Action potentials are not tools or means to be "used" by neurons. Other examples of misuse of the verb "use" are found in several other sentences:

      "...nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8"

      "Flexible use of different NaV subtypes - an example of degeneracy - compromises..."

      "Nociceptors can achieve equivalent excitability using different sodium channel subtypes" "...degeneracy - the ability of a biological system to achieve equivalent function using different components..."

      "...nociceptors can achieve equivalent excitability using different sodium channel subtypes..."

      "Our results show that nociceptors can achieve similar excitability using different NaV channels" "...the spinal dorsal horn circuit can achieve similar output using different synaptic weight combinations..."

      "Contrary to the view that certain ion channels are uniquely responsible for certain aspects of neuronal function, neurons use diverse ion channel combinations to achieve similar function" "In summary, our results show that nociceptors can achieve equivalent excitability using different NaV subtypes"

      “Use” can mean to put into action (without necessarily implying intention). Based on definitions of the word in various dictionaries, we feel we are well within the realm of normal usage of this term. In trying to achieve a clear and succinct writing style, we have stuck with our original word choice.

      • At the end of page 5 and in the legend of Fig. 7, the word "encourage" is not properly used in the sentence "The ability of NaV1.3, NaV1.7 and NaV1.8 to each encourage repetitive spiking is seemingly inconsistent with the common view...". Encouraging is really an action of humans or animals on other humans or animals.

      Like for “use”, we verified our usage in various dictionaries and we do not think that most readers will be confused or disturbed by our word choice. We use “encourage” to explain that increasing NaV1.3, NaV1.7 or NaV1.8 can increase the likelihood of repetitive spiking; we avoided “cause” because the probability of repetitive spiking is not raised to 100%, since other factors must always be considered.

      • In the Abstract and other places in the manuscript, the word "responsibility" seems to be wrongly employed. It is true that one can say, for instance, on page 4 last paragraph "we sought to identify the NaV subtype responsible for repetitive spiking at each time point". However, to confer channels with the human quality of having "responsibility" for something does not seem appropriate. See also page 8 last paragraph, the first paragraph of the Discussion, and the three paragraphs of page 11.

      Again, we must respectfully disagree with the reviewer. We appreciate that this reviewer does not like our writing style but we do not believe that our style violates English norms.

      (2) In the first sentence of the Abstract, nociceptive sensory neurons do not convey "pain signals". Pain is a sensation that is generated in the brain.

      “Pain” is used as an adjective for “signal” and is used to help identify the type of signal. Nonetheless, since the word count allowed for it, we now refer to “pain-related signals” (line 10).

      (3) I do not see the point of plotting the firing rate as a function of relative stimulus amplitude (normalized to the rheobase, e.g., Fig. 1A bottom panels, Fig. 2B, bottom-right, Fig. 2 Supp2A right, Fig. 3 B bottom panels, etc) instead of as a function of the actual stimulus amplitude. I have the impression that this maneuver hides information. This is equivalent to plotting the current amplitudes as a function of the voltage normalized by the voltage threshold for current activation, which is obviously not done.

      This is how the experiments were performed, so it would be impossible to perform the statistical analysis using the absolute amplitudes post-hoc; specifically, stimulus intensities were tested at increments defined relative to rheobase rather than in absolute terms. There are pros and cons to each approach, and both approaches are commonly used. Notably, we report the value of rheobase on the figures so that readers can, with minimal arithmetic, convert to absolute stimulus intensities. No information is hidden by our approach.

      (4) On page 4 it is stated that "We show later that similar changes develop in vivo following inflammation with consequences for drug efficacy assessed behaviourally (see Fig. 8), meaning the NaV channel reconfiguration described above is not a trivial epiphenomenon of culturing". However, what happens in culture may have nothing in common with what happens in vivo during inflammation. Thus, the latter data may not serve to answer whether the culture conditions induce artifacts or not. I suggest tuning down this statement by changing "meaning" to "suggesting".

      On line 97, we now write “suggesting”.

      (5) Page 5, first paragraph, I miss a clear description of the mathematical models. Having to skip to the Methods section to look for the details of the models as the artifices introduced to simulate different conditions is rather inconvenient.

      So as not to disrupt the flow of the presentation with methodological details, we only provide a short description of the model in the Results. We have slightly expanded this to point out that the conductance-based model is also single-compartment (line 111). We provide a very thorough description of our model in the Methods, especially considering all the details provided in Supplementary Tables 1, 5 and 6. We also report conductance densities and % changes in figure legends (lines 722, 747-748; now shown in red). This is also true for Figure 3-figure supplement 2 (lines 756-759). We tried very hard to find a good balance that we hope most readers will appreciate.

      (6) Page 6, second paragraph, simulations do not serve to "measure" currents.

      The sentence been revised to indicate that simulations were used to “infer” currents during different phases of the spike (line 155).

      (7) Page 7, regarding the tile of the subsection "Control of changes in NaV subtype expression between DIV0 and DIV4-7", the authors measured the levels of expression, but not really the mechanisms "controlling" them. I suggest writing "changes in NaV subtype expression between DIV0 and DIV4-7"

      We have removed “control of” from the section title (line 185)

      (8) What was the reason for adding a noise contribution in the model?

      We now explain that noise was added to reintroduce the voltage noise that is otherwise missing from simulations (line 474). For instance, in the absence of noise, membrane potential can approach voltage threshold very slowly without triggering a spike, which does not happen under realistically noisy conditions. Of course membrane potential fluctuates noisily because of stochastic channel opening and a multitude of other reasons. This is not a major issue for this study, and so we think our short explanation should suffice.

      (9) Please, define the concept of degeneracy upon first mention.

      Degeneracy is now succinctly defined in the abstract (line 20).

    1. Author Response

      The following is the authors’ response to the current reviews.

      Our answer to the final point(s) raised is as follows:

      "We thank the reviewer for the comment. We checked our datasets accordingly. Typically, the n of cells showed deviations of maximally 20% from experiment to experiment (e.g. 16-24 cells per experiment). Additionally, experiments were performed using different passages of the cells. Moreover, data were validated at different time-points during the study using newly thawed cell lines."


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bischoff et al present a carefully prepared study on a very interesting and relevant topic: the role of ion channels (here a Ca2+-activated K+ channel BK) in regulating mitochondrial metabolism in breast cancer cells. The potential impact of these and similar observations made in other tumor entities has only begun to be appreciated. That being said, the authors pursue in my view an innovative approach to understanding breast cancer cell metabolism. Considering the following points would further strengthen the manuscript:

      We thank reviewer #1 for the overall positive feedback on our study.

      Methods:

      (1) The authors use an extracellular Ca2+ concentration (2 mM) in their Ringer's solutions that is almost twice as high as the physiologically free Ca2+ concentration (ln 473). Moreover, the free Ca2+ concentration of their pipette solution is not indicated (ln 487).

      Indeed, we utilized 2 mM of Ca2+ in the physiologic live-cell imaging buffer. This concentration could actually be a little lower than the total Ca2+ concentration (ranging usually from 2.2 to 2.6 mM) in the body, while the free Ca2+ concentration is typically half as high. Nevertheless, we find multiple studies different from ours, which utilized 2 mM for their live-cell-based experiments. Please check the following studies, which represent only a small selection:

      https://doi.org/10.1038/s41598-019-49070-8

      https://doi.org/10.1016/j.bpj.2020.08.045

      https://doi.org/10.1016/j.redox.2022.102319

      However, to ensure that the applied conditions are physiologically relevant, we reperformed experiments using MMTV-PyMT WT and MMTV-PyMT BK-KO cells and compared cytosolic Ca2+ concentrations over time in response to cell stimulation with ATP, either in the presence of 1.0 mM (Author response image 1A) or 2.0 mM extracellular Ca2+ (Author response image1B). The respective graphs are attached in the following for reviewer’s inspection. As expected, we find that the intracellular Ca2+ concentration in MMTV-PyMT WT and BK-KO cells was dependent on the extracellular Ca2+ concentration. Importantly, however, irrespective of the exact Ca2+ concentration applied, we observed a similar difference in basal cytosolic Ca2+ between MMTV-PyMT WT and BK-KO cells (Author response image1C).

      Author response image 1.

      Cytosolic Ca2+ concentrations over-time in the presence of 1.mM or 2.0 mM extracellular Ca2+.

      Concerning the Ca2+ concentration in the patch-pipette – we are very glad that you uncovered an error in our description and apologize for the mistake. Actually, the information the reviewer is referring to was already given in the previous version of the manuscript, but unclear because a comma was shifted (see line 487 in the originally submitted manuscript). The Ca2+ concentration of the patch-pipette was 0.1 mM in the presence of 0.6 mM EGTA, which should (according to Ca-EGTA calculator, https://somapp.ucdmc.ucdavis.edu/pharmacology/bers/maxchelator/CaEGTA-NIST.htm) be equivalent to ~30 nM of free Ca2+ in the patch pipette. We corrected the mistake in the manuscript and thank the reviewer again for spotting this inaccuracy.

      (2) Ca2+I measurements: The authors use ATP to elicit intracellular Ca2+ signals. Is this then a physiological stimulus for Ca2+ signaling in breast cancer? What is the rationale for using ATP? Moreover, it would be nice to see calibrated baseline values of Ca2+i.

      We thank the reviewer for the comment and suggestion. Importantly, it was demonstrated recently, that all of the utilized cell lines respond to treatment with extracellular ATP with a prominent increase in Ca2+I, most probably indicating the expression of purinergic receptors, which was a prerequisite to observe ATP induced changes in [Ca2+]i.

      https://doi.org/10.1038/s41419-022-05329-z,

      https://doi.org/10.1093/carcin/bgt493

      https://doi.org/10.1038/s41598-018-26459-5

      Furthermore, ATP plays a crucial role in the tumor microenvironment, where high rates of cell death occur. Hence, ATP is of pathophysiologic relevance for the utilized cancer cell lines.

      https://doi.org/10.1038/s41568-018-0037-0

      https://doi.org/10.3390/cells9112496

      https://doi.org/10.1002/jcp.30580

      Following the suggestions by Reviewer #1 (and #2), we included calibrations of Ca2+cyto and Ca2+mito in the manuscript, by depleting the intracellular Ca2+ stores using Ionomycin in the absence of extracellular Ca2+ (EGTA) to validate the basal difference in Ca2+cyto and Ca2+mito. Additionally, Ca2+cyto was calibrated under basal and inhibitor treated conditions, and values in nM are given in the text (p. 5, lines 185-190, 193-195 and 199-200, in the tracked changes version of the MS). The new data can be found in new Figure S2F – Figure S2J and new Figure S2R – Figure S2V. Moreover, we calculated basal [Ca2+]cyto in the different BKCa pro- and deficient cell lines and under inhibitor treated conditions. We additionally added information about the pathophysiologic relevance of ATP in the tumor microenvironment in lines 175-178 in the tracked changes version of the manuscript.

      (3) Membrane potential measurements: It would be nice to see a calibration of the potential measurements; this would allow us to correlate the IV relationship with membrane potential. Without calibration, it is hard to compare unless the identical uptake of the dye is shown. Does paxilline or IbTx also induce depolarization?

      We thank the reviewer for the suggestion. Indeed, membrane potential calibrations/ measurements using the membrane potential sensitive dye Dibac4(3) would be interesting, however, technically hardly feasible. The reason is that the principle of the dye is based on different uptake in response to differences in membrane potential, and not ratiometric as for most other dyes/ sensors used. Considering this limitation, we decided to perform membrane potential measurements by patch-clamp analysis. Additionally, we performed these experiments upon inhibition of PM-located BKCa by IBTX. Current-clamp experiments confirmed the difference in basal membrane potential between MMTV-PyMT WT and BK-KO cells (consult new Figure S1C and lines 127-130 in the tracked changes version of the manuscript). Interestingly, IBTX treatment depolarized the PM potential to the BK-KO cell level, which validates that BK activity and PM potential are connected. In addition to this approach, we utilized our recently developed genetically encoded K+ sensors revealing basal differences in [K+]cyto between MMTV-PyMT WT and BK-KO cells. Also this difference between both genotypes was equalized by IBTX as the respective treatment increased [K+]cyto only in WT cells, which most likely explains the cause of PM depolarization (consult lines 130-135 in the tracked changes version of the manuscript and new Figure S1D and Figure S1E).

      (4) Mito-potential measurements: Why did the authors use such a long time course and preincubate cells with channel blockers overnight? Why did they not perform paired experiments and record the immediate effect of the BK channel blockers in the mito potential?

      We thank the reviewer for the suggestion. We performed TMRM-based experiments with MMTV-PyMT WT cells in response to short-term exposure to paxilline, which did not significantly affect the mitochondrial membrane potential, at least within 15 minutes of treatment (Author response image 2). This indicates, that further downstream processes subsequent to (mito) BKCa inhibition affect the mitochondrial membrane potential(MMP), most probably including remodeling processes of the respiratory chain, mitochondrial ion homeostasis or glycolytic activity, ultimately also delivering reduction equivalents to mitochondria. Our final goal was to validate potential differences between a BKCa pro-and deficient cell model, whereby the latter cells lacked the BKCa channel since its origination. Hence, “long-term” (~12h) BKCa inhibition as performed in our experiments rather reflects the BK-KO cell situation. Taken together with the new experiment (Author response image 2), we can now state that the effect of BK inhibition on the MMP is at least not the consequence of an acute (within minutes) channel blockade.

      Author response image 2.

      Mitochondrial membrane potential, as measured using TMRM, in response to acute short-term administration of 5µM paxilline, followed by mitochondrial depolarization using FCCP.

      (5) MTT assays are also based on mitochondrial function - since modulation of mito function is at the core of this manuscript, an alternative method should be used.

      We thank the reviewer for the important comment. We performed additional, immunofluorescence-based experiments using Ki-67 staining to assess cell proliferation rates. The newly added data can be found in the text, lines 409-412 in the tracked changes version of the manuscript and new Figure S6D-F. The results obtained confirm the MTTbased results (Fig.6H-I).

      Results:

      (1) Fig. 5G: The number of BK "positive" mitoplasts is surprisingly low - how does this affect the interpretation? Did the authors attempt to record mitoBK current in the "whole-mitoplast" mode? How does the mitoBK current density compare with that of the plasma membrane? Is it possible to theoretically predict the number of mitoBK channels per mitochondrion to elicit the observed effects? Can these results be correlated with the immuno-localization of mitoBK channels?

      Indeed, the number of BKCa-positive mitoplasts appears low on a first view. However, as these experiments were performed in a mitoplast-attached mode, it is important to keep in mind that only a very small area of the actual mitoplast is investigated with each patch. If no channel was detected in such region, the patch was depicted as “empty”, as presented in Fig.5G, which does, however, not mean that the entire mitochondria was actually BKCa negative. Hence, the density of BKCa in the IMM might be higher than expected from our experiments. Nevertheless, already earlier results using glioblastoma cell lines – considered to be one of the cell lines mostly enriched in mitoBKCa – demonstrated a quite low density of BKCa β4 regulatory subunit in mitochondria – please see figure 2B in the following paper: 10.1371/journal.pone.0068125 – which (based on 1:1 stoichiometry of α and β subunits) also suggests that the density of the alpha subunit of BKCa might be low in this compartment.

      Author response image 3.

      Author response image 3: Schematic representation of mitoplast attached patch-clamp experiments

      Theoretically, density predictions of mitoBK compared to PM localized BKCa would be possible if whole-mitoplast experiments were performed, however, we are unsure what added value this information would actually burst, allowing the pharmacologic modulation of structures originally located within the mitochondrial matrix. Please also consult Author response image 3. According to the most recent models, even if there are other views on this, mitoBKCa is oriented in a way, that the C-terminus with its Ca2+ binding bowl is located within the mitochondrial matrix. Hence, to allow Ca2+ sensitivity experiments of the channel, broken up (by swelling) mitoplasts are required to make the Ca2+ binding bowl accessible for Ca2+ manipulations in the bath solution. This approach does not allow us to compare the channel density to that of the PM.

      Finally, to the best of our knowledge, a combination of immunofluorescence with mitoplast patch-clamp experiments is not feasible yet, and would probably be impossible due to the low density of the mitoBKCa as well as the lack of highly sensitive and specific antibodies.

      (2) There are also reports about other mitoK channels (e.g. Kv1.3, KCa3.1, KATP) playing an important role in mitochondrial function. Did the authors observe them, too? Can the authors speculate on the relative importance of the different channels? Is it known whether they are expressed organ-/tumor-specifically?

      Author response image 4.

      Representative single channels different to mitoBKCa detected in MDAMB-453 mitoplasts.

      The reviewer is right, other K+ channels have been found in mitochondria and these also play a role in tumor cells. This is also consistent with our data (Fig.5G), where we observed other channels in the mitoplasts of BCCs as well. These all four cell lines tested. According to their conductance and our expectations from literature, these channels may e.g. include mitoIKCa, mitoSKCa, mitoKATP orothers (10.1146/annurev-biophys-092622-094853). As we focused, however, on patches containing a mitoBKCa, we did not further pharmacologically characterize these channels. Two examples of channels we found in these mitoplasts besides BKCa are presented for reviewers’ inspection (Author response image 4). As our manuscript focusses on mitoBKCa, we did not further classify these channels in smaller subgroups according to their conductance, as we feel that a differentiation between BKCa (~210 pS), and channels showing a conductance ≤150pS, or a conductance ≤100 pS is sufficient. Furthermore, this additional information would dilute our story too much making it difficult for the (non-specialist) reader to follow the red thread of the study. We added respective information in the manuscript, however. Please consult lines 365-366 in the tracked changes version of the manuscript.

      Reviewer #1 is right, the observed the different K+ channels might of course be organ- or tumor-specific. For example, it has been reported that the expression of K+ channels is different in various cancer cell (lines) (https://doi.org/10.2174/13816128113199990032, 10.1016/j.pharmthera.2021.107874, 10.1038/nrc3635), a fact, which also according to our study might be exploited for pharmacological manipulation, aiming to affect proliferation/apoptosis of cancer cells. Further, a recently published single-cell and spatially resolved atlas of human breast cancer implies that the expression of different K+ channels (such as mitoIKCa, mitoSKCa, mitoKATP) might even differ between cancer- and non-cancer cells within a single tumour (https://doi.org/10.1038/s41588-021-00911-1).

      Reviewer #2 (Public Review):

      Summary:

      The large-conductance Ca2+ activated K+ channel (BK) has been reported to promote breast cancer progression, but it is not clear how. The present study carried out in breast cancer cell lines, concludes that BK located in mitochondria reprograms cells towards the Warburg phenotype, one of the metabolic hallmarks of cancer.

      Strengths:

      The use of a wide array of modern complementary techniques, including metabolic imaging, respirometry, metabolomics, and electrophysiology. On the whole, experiments are astute and well-designed and appear carefully done. The use of BK knock-out cells to control for the specificity of the pharmacological tools is a major strength. The manuscript is clearly written.

      There are many interesting original observations that may give birth to new studies.

      Weaknesses:

      The main conclusion regarding the role of a BK channel located in mitochondria appears is not sufficiently supported. Other perfectible aspects are the interpretation of co-localization experiments and the calibration of Ca2+ dyes. These points are discussed in more detail in the following paragraphs:

      We thank reviewer #2 for the thorough assessment of our study.

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Unfortunately not, at least with the available evidence. While it is clear these cells have a BK in mitochondria (characteristic K+ currents detected in mitoplasts) and it is also well substantiated that the metabolic effects in intact cells are explained by an intracellular BK (paxilline effects absent in the BK KO), it does not follow that both observations are linked. Given that ectopic BKDEC appeared at the surface, a confounding factor is the likely expression of BK in other intracellular locations such as ER, Golgi, endosomes, etc. To their credit, authors acknowledge this limitation several times throughout the text ("...presumably mitoBK...") but not in other important places, particularly in the title and abstract.

      We thank the reviewer for this important comment and amended the title and abstract, respectively. The title of the manuscript was changed to “mitoBKCa is functionally expressed in murine and human breast cancer cells and potentially contributes to metabolic reprogramming.” Additionally, we changed appropriate passages in the text, to emphasize that mitoBKCa potentially mediates the metabolic reprogramming, but other intracellular channels could also contribute to these processes.

      (2) MitoBK subcellular location. Pearson correlations of 0.6 and about zero were obtained between the locations of mitoGREEN on one side, and mRFP or RFP-GPI on the other (Figs. 1G and S1E). These are nice positive and negative controls. For BK-DECRFP however, the Pearson correlation was about 0.2. What is the Z resolution of apotome imaging? Assuming an optimum optical section of 600 nm, as obtained by a 1.4 NA objective with a confocal, that mitochondria are typically 100 nm in diameter and that BK-DECRFP appears to stain more structures than mitoGREEN, the positive correlation of 0.2 may not reflect colocalization. For instance, it could be that BK-DECRFP is not just in mitochondria but in a close underlying organelle e.g. the ER. Along the same line, why did BK-RFP also give a positive Pearson? Isn´t that unexpected? Considering that BK-DEC was found by patch clamping at the plasma membrane, the subcellular targeting of the channel is suspect. Could it be that the endogenous BK-DEC does actually reside exclusively in mitochondria (a true mitoBK), but overflows to other membranes upon overexpression? Regarding immunodetection of BK in the mitochondrial Percoll preparation (Fig. S5), the absence of NKA demonstrates the absence of plasma membrane contamination but does not inform about contamination by other intracellular membranes.

      Indeed, it seems that BKCa-DEC is not an exclusive mitoBKCa, at least not upon (over-/)expression in MCF-7 cells. It is known from literature, that mitochondrial K+ channels are encoded by the nuclear genome, as no obvious gene for a K+ channel is found in the mitochondrial genome. Channel proteins are synthetized by cytosolic ribosomes and likely translocated into mitochondria via the TOM/TIM system. Although some K+ channels possess a mitochondrial targeting sequence at the N-terminus, their import is mostly far from a general mechanism, and this seems also to be true for BK channels. In the case of the K+ channel Kv1.3, an even more complex scenario is hypothesized, as the channel located in the PM could be transferred to mitochondria via mitochondria-associated membranes (MAM) structures of the ER (https://doi.org/10.3390/ijms20030734). Yet, the detailed mechanism for BK shuttling to mitochondria is not fully understood. Possibly, overflow is exactly what is happening, due to very high levels of BK-DEC expression upon transfection. However, that the channel translocates to the IMM upon transfection is not surprising and was also demonstrated for other cell models including HEK293 – see e.g. 10.1038/s41598-021-904653. Unfortunately, transfection efficiency of MCF-7 is quite low compared to HEK293 – hence, quantitative statements from mito-patches upon transfection are difficult.

      In order to ensure that the mitochondrial colocalization is not a matter of poor microscope resolution, we reperformed these experiments using confocal imaging on a Zeiss LSM980 with an Airyscan 2 detector, yielding z resolutions of ~ 450 nm. These experiments confirmed the increased colocalization of BKCa-DEC with mitochondria compared to BKCa lacking the DEC exon. Furthermore, this imaging at higher resolution demonstrated, that, unfortunately, colocalization might not be the best analysis, as especially fragmented mitochondria showed a clear MitoGREEN stained matrix, surrounded by red fluorescence derived from BKCaDECRFP present in the IMM (revised Fig. 1G).

      To validate the results derived from immunoblotting, we additionally stained the membranes for TMX1, a marker for the ER membrane. This analysis confirmed the high purity of the mitochondrial isolation without ER-membrane contamination after percoll purification, and hence validated the presence of BKCa in the mitochondrial membrane (revised Fig. S5D). The additional information can be found in lines 156-159 in the tracked changes version of the manuscript.

      (3) Calibration of fluorescent probes. The conclusion that BK blockers or BK expression affects resting Ca2+ levels should be better supported. Fluorescent sensors and dyes provide signals or ratios that need to be calibrated if comparisons between different cell types or experimental conditions are to be made. This is implicitly acknowledged here when monitoring ER Ca2+, with an elaborate protocol to deplete the organelle in order to achieve a reading at zero Ca2+.

      We thank the reviewer for the important comment. Please note that at no point in the manuscript we aim to compare different cell lines concerning their intracellular Ca2+ concentration, but we only compare the same cell lines after the different treatments, as we are aware of this limitation of fluorescent probes. However, to validate the differences in intracellular Ca2+ concentrations, we calibrated the signals derived from Fura-2 and 4mtD3cpV using ionomycin in combination with cellular Ca2+ depletion/ saturation. The newly added data can be found in the text, lines 185-190, 192-195, 199-200, and 228-230 in the tracked changes version of the manuscript, as well as new Figure S2F – Figure S2J and new Figure S2R – Figure S2V

      Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      Author response image 5.

      Dot blot for data shown in Figure 2I.

      The reviewer is right, it seems that BKCaRFP may also affect [Ca2+]mito. However, the effect is not significant and shows a p-value of p>0.999 using Kruskal-Wallistest followed by Dunn’s multiple comparison test, due to the non-normally distributed nature of the data. p=0.0002 for ctrl vs. BKCa-DECRFP and 0.0022 for BKCaRFP vs. BKCa-DECRFP, however. We added a scatter dot-blot of the respective data as Author response image 5 for reviewer’s inspection. Additionally, first, even using a more stringent statistical test by only comparing ctrl vs BKCaRFP using Mann-Whitney test, the results are not significant, as the p-value was determined at 0.4467, and second, we performed the requested Ca2+calibration using ionomycin under these conditions, which confirmed the difference between ctrl cells and BKCa-DECRFP expressing cells, but not BKCaRFP expressing ones. Please see Figure S2V.

      Reviewer #3 (Public Review):

      The original research article, titled "mitoBKCa is functionally expressed in murine and human breast cancer cells and promotes metabolic reprogramming" by Bischof et al, has demonstrated the underlying molecular mechanisms of alterations in the function of Ca2+ activated K+ channel of large conductance (BKCa) in the development and progression of breast cancer. The authors also proposed that targeting mitoBKCa in combination with established anti-cancer approaches, could be considered as a novel treatment strategy in breast cancer treatment.

      The paper is clearly written, and the reported results are interesting.

      Strengths:

      Rigorous biophysical experimental proof in support of the hypothesis.

      Weaknesses:

      A combinatorial synergistic study is missing.

      We thank reviewer #3 for the positive summary of our study. Indeed, we propose that targeting of mitoBKCa in combination with established anti-cancer drugs may represent a novel anti-cancer treatment strategy. Unfortunately, we feel that the manuscript is very condensed already, and that adding respective required experiments and data to support this hypothesis will make the flow of the manuscript more complex or even incomprehensible. As no attempts linking mitoBKCa activity with anti-cancer therapies have been made so far, we removed the respective information from the abstract and only discuss this aspect.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Statistics: Legends have to contain information about the number of biological replicates (N) and cells analysed (n). Statistics must be calculated with the averages of the replicates.

      Author response image 6.

      Representative single cell responses of Fura-2 loaded MMTV-PyMT WT cells.

      We thank the reviewer for the comment and added the missing details to all figure legends.

      We feel that using each cell represents exactly the power of high-resolution live-cell imaging, as there is no better biological replicate than a single separated cell, which is observed by fluorescence microscopy. This analysis is also able to visualize cell-to-cell differences in the microscopy area, similarly to patch-clamp experiments, where each single cell or mitoplast patched is used as a single replicate. Please find a representative dataset derived from fluorescence microscopy of different responses of neighboring single cells in Author response image 6.

      (2) Fig. 1G: This is a poor resolution figure, mostly because of its far too small size; in its current form it bears very little information.

      We agree with reviewer #1 and reperformed the imaging experiments using high resolution confocal imaging and exchanged the respective images. We feel that this increased the quality of the images significantly. Unfortunately, we were not able to increase the size of the images in the main figure, hence, we added magnifications of the respective images as new Figure S1I.

      (3) Fig. 1H: What do the dotted grey lines and the labels stand for?

      We believe Reviewer #1 is probably referring to Figure 1G. As indicated in the figure panel and in the text, the grey dotted lines and labels indicate the colocalization scores of mtRFP and RFP-GPI with MitoGREEN, respectively. These data are also shown in Figure S1H, including error bars and statistics. We added additional information in the text to make the meaning of the lines clearer to the reader. Please consult lines 149 – 150 in the tracked changes version of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Short of a way to tackle BK function and metabolism specifically in mitochondria, the conclusion may best be toned down to "intracellular BK". For the time being the term "mitoBK" appears too ambitious.

      We fell that you are right and that our previous overstatement requires adaptation as a clear (100%) attribution of the observed metabolic effects solely to mitoBKCa is not definitely possible. We have therefore amended all relevant passages in the entire MS accordingly.

      (2) MitoBK subcellular location. Please address the points raised in the Public Review.

      As stated above we addressed the point raised in the public review accordingly (please consult new Figure S1I and revised Figure 1G).

      (3) Calibration of fluorescent probes. Please provide calibrations for cytosolic and mitochondrial Ca2+, for example, the standard high Ca2+/ionophore/metabolic inhibition treatment to reach saturation followed by Ca2+ chelation to obtain zero Ca2+.

      We thank the reviewer for the comment. As you can see from our response to the public review, we performed the respective experiments, and datasets were added in the manuscript.

      (4) Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      Please consult our response to the (same) comment in the public review.

      (5) Line 228. The statement "Similar results were obtained in MDA-MB-453 cells" is confusing. As shown in Fig.3, pax reduced ECAR and OCR in MMTV-PyMT WT cells. As ibtx was without effect, it is suggested that intracellular BK support metabolism. However, the effect of pax on MDA cells was the opposite. Doesn´t this divergence speak against a universal role of intracellular BKs in promoting metabolism in BCCs? A similar point may be made regarding metabolomics, which showed no effects of pax on lactate and pyruvate in MMTV-PyMT WT cells but stimulation in MDA cells. Perhaps the word "promotes" in the title of the figure should be replaced by something more neutral like "affects" or "alters", as used elsewhere,

      We thank the reviewer for pointing out the overstatement regarding intracellular BK functions and changed the title of the figure as suggested.

      With regard to the experiments mentioned, we would like to point out the following aspects:

      First, the cell lines used strongly differ in their metabolic settings under basal conditions. While both, MMTV-PyMT and MDA-MB-453 cells seem to show similar basal ECAR levels (if BKCa was present), their OCR seems to differ strongly. MMTV-PyMT cells seem to show a basal OCR which is almost at the maximum already, while MDA-MB-453 cells possess a tremendous capacity in their OCR, as observed upon mitochondrial uncoupling using FCCP. Of note, both, ECAR and OCR are indirect metabolic measures. On the one hand, ECAR measures extracellular acidification, which is accomplished by H+ along with lactate secretion. However, lactate secretion is not the only process leading to extracellular acidification, and ECAR may hence measure a variety of H+ releasing processes, including processes of vesicle secretion. On the other hand, OCR is not directly linked to ATP production, as mitochondrial complex IV is consuming O2, ATP, however, is produced by mitochondrial complex V. This becomes even more evident when having a look on OCRs after FCCP treatment – under these conditions, the H+ gradient is destroyed and ATP synthase activity is reduced, OCR, however, increases to the maximum due to increased supply of mitochondrial complex IV with H+.

      Second, please note that the LC-MS-based metabolomics derive from a static single time point and not from an over-time “live” read-outs. Moreover, underlying dynamics of the parameters measured can not be assessed. Hence, as an example, increasing levels of pyruvate can e.g. indicate faster generation, or slower subsequent degradation/ metabolization. A clear in-depth statement about what is happening under basal and BKCa inhibitor treated conditions is hence not possible. The only conclusion possible to draw from these experiments is that paxilline treatment differentially affects metabolic pathways in these cells.

      Based on these limitations of both methods, we decided to perform our in-depth fluorescence microscopy-based analysis, which provided strong evidence for intracellular BKCa channels on mitochondrial ATP production. Despite opposing effects of BKCa inhibition on OCR in MMTV-PyMT WT and MDA-MB-453 cells, mitochondrial ATP production was reduced, if BKCa-DECRFP was expressed/ intracellular BKCa was functional.

      In line with these findings, mitoBKCa was recently described as an uncoupling protein, which could furthermore explain the differential effects of intracellular BKCa inhibition on OCR. https://doi.org/10.1038/s41598-021-90465-3

      Minor

      (6) Fig. 1C. Average fluorescence intensity in 6 experiments was about 20% higher in BK-KO cells relative to WT. Such a small difference is significant but should not be evident to the eye. The pictures selected for illustration appear to show a much larger difference and therefore may not be representative. If this is the case, please omit them. The same goes for the other representative pictures.

      Author response image 7.

      : Representative images at different brightnesses.

      Please note, that the analysis of the images was done in an unbiased way using a Fiji macro. After analysis, we chose representative images, which were closest to the average.

      Furthermore, we must kindly disagree with the reviewer as changes of 20% in fluorescence intensity are indeed evident to the eye (consult Author response image 7). This panels show the same image at different brightness levels with intensity differences of 20%. Hence, we feel, that all the images the reviewer was referring are representative for the values given.

      (7) Line 130. The definition of "recent" is of course relative, but 10 years?

      We are very glad that you have discovered this “inconsistency", and reworded the respective phrase accordingly.

      (8) Line 327. "conductivity" is the property of a medium, "conductance" is the property of a component, such as a channel.

      We thank the reviewer for the important comment. We revised the text accordingly.

      (9) Various figures. FRET sensor data are expressed as Ratio(FRET/CFP). This is unusual, typically it should be FRET ratio (YFP/CFP), FRET ratio(mTFP/Venus), etc. Please note that the FRET partners differ between sensors.

      We acknowledge the comment of the reviewer. It is correct that fluorescent proteins vary widely between the sensors (used). Please note, however, the following: The emission measured from these sensors actually represents FRET, as CFP but not YFP is directly excited. Hence, emission is FRET, not the “intrinsic” fluorescence of the YFP. This is getting more and more important to differentiate, as there are probes existing, which can also be “alternately” excited, i.e. CFP and YFP separately, which will then yield the YFP/CFP ratio (https://doi.org/10.1021/acssensors.8b01599). In case of only CFP excitation, we feel, that the term FRET/CFP is preferable over other labelings such as YFP/CFP.

      (10) BK-DEC makes BCCs cells less oxidative. However, BK-DEC was first described in cardiomyocytes, which are among the most oxidative cell types. It would be useful if authors could address this apparent contradiction in the Discussion Section.

      That is an exciting point that we addressed as follows in the revised MS:

      First, it is important to mention that cardiac myocytes do not show a metabolic Warburg setting and are – under physiologic conditions – maintained in a high O2 environment.

      Second, a recent study from our group addressed the question about the role of mitoBKCa in primary cardiac myocytes. Indeed, mitoBKCa was functionally expressed in these cells. Interestingly, under physiologic conditions, the channel did not alter (multiple) cell behaviours nor overall cardiac physiology in a mouse model. However, upon induction of ischemia/ reperfusion injury, a lack of BK increased cardiac susceptibility to cell death resulting in increased infarction size (https://doi.org/10.1161/CIRCULATIONAHA.117.028723). Hence, also in this cell model, BKCa only played a role under oxygen limited conditions/ conditions where mitochondria were not properly functioning. Thus, the results derived from cardiac myocytes support our recent findings in BCCs, as BKCa mediates BCC resistance to hypoxic stress/ makes BCCs more independent from oxidative metabolism.

      Parts of this discussion were included in the revised MS. Please consult lines 490-500 in the tracked changes version of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The study is very well designed and most of the computational analyses were done rigorously.

      We highly appreciate the positive feedback by reviewer #3.

      (2) The authors should discuss the expression of BKCa in different subsets of breast cancer. Authors may also debate on the level of steroid receptors and BKCa expressions.

      We thank reviewer #3 for the important suggestion and added the requested information in the discussion, lines 445-447 and 450-454 in the tracked changes version of the manuscript.

      (3) In the discussion section, the authors mentioned that the MCF7 cell is the best model to study this hypothesis. Does it imply that triple-negative breast cancer cell lines express lower levels of BKCa? The authors should discuss this.

      We thank the reviewer for the interesting comment; we would like to point out that the ERα-positive MCF-7 cell line was used to study experimental overexpression of BKCa at an otherwise low baseline level. This does not imply that BKCa is expressed at lower levels in TNBC cell lines; in fact a recent study showed the opposite, i.e. overexpression of BKCa in TNBC patients (10.1186/s12885-020-07071-1). Consistent with our work, the authors conclude that the channel could even be a new strategy for development of a targeted therapy in TNBC. We also added this information in the discussion, lines 450-454 in the tracked changes version of the manuscript.

      (4) The authors propose that combinatorial targeting of mitoBKCa along with known breast cancer chemotherapeutics can open a new horizon in breast cancer treatment. However, the authors did not perform any experiment to show the synergistic effect as mentioned.

      As already stated in the public reviews, we feel that the manuscript is very condensed already, and that adding the respective experiments and data will make the flow of the study even more complex. For the moment, we removed all information and statements linking mitoBKCa with anti-cancer treatment strategies from the abstract and only discuss this aspect. We hope that the reviewer agrees with us that an extensive analysis of the functional mitoBKCa status in the context of established breast cancer therapies must be addressed by (our) future studies.

      Minor Comments:

      There are several typos and grammatical errors that need further attention and rephrasing.

      We thank the reviewer for the comment and revised the text accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper represents important findings when identifying untargeted metabolomics and its differences between metabolomes of different biological samples. GromovMatcher is the fantasy name for the soft development. The main idea behind it is built on the assumption of featuring and matching complex datasets. Although the manuscript reflects a solid analysis, it remains incomplete for validation with putative non-curated datasets.

      We are grateful to the eLife editor for taking the time and effort to assess our manuscript.

      We are however unsure of what the editor means by “it remains incomplete for validation with putative non-curated datasets”. As noted by Reviewer 2, manually curated datasets that could be used for validation are scarce. Most publicly available datasets do not contain sufficient information to establish a ground truth matching on which GromovMatcher, M2S, or metabCombiner can be tested. Even in the case where such a ground truth matching can be established, it must be performed by-hand through a manual matching process which is extremely time-consuming and requires very specific expertise. This, in our opinion, only highlights the need for automatic alignment methods such as metabCombiner, M2S or GromovMatcher.

      We do agree that the performance of GromovMatcher (and its competitors) needs to be validated further, and we plan to continue validating GromovMatcher as additional data becomes available in EPIC and other cohorts. With that in mind, the lack of publicly available validation data is the reason why we conducted such an extensive simulation study, arguably more comprehensive than previous validations, exploring challenging settings that we believe reflect real-life scenarios (main text “Validation on ground-truth data” and Appendix 3). We would like to stress that this allows us to highlight previously ignored limitations of the previously published methods, metabCombiner and M2S.

      We wish to thank the editor and reviewers for their time and efforts in reviewing our manuscript which led to many significant additions to our paper. Namely we:

      • Performed an additional sensitivity analysis (Appendix 3) exploring how an imbalance in the number of features or samples between two studies being matched (e.g. the dataset split), affects the quality of matchings found by GromovMatcher, metabCombiner, and M2S.

      • Investigated how changing or removing the reference dataset (Appendix 5) in the EPIC study (main text “Application to EPIC data”), affects the results of GromovMatcher.

      • Improved alignment matrix visualizations in Fig. 3a for all four methods tested on the validation data, to highlight more clearly which feature matches were correctly identified or missed.

      The revised paper is uploaded as the file “main_elife_revision.pdf” where all revisions are highlighted in blue as well as a copy “main_elife_revision_nohighlights.pdf” where revisions are not highlighted.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors have implemented the Optimal Transport algorithm in GromovMatcher for comparing LC/MS features from different datasets. This paper gains significance in the proteomics field for performing meta-analysis of LC/MS data.

      Strengths:

      The main strength is that GromovMatcher achieves significant performance metrics compared to other existing methods. The authors have done extensive comparisons to claim that GromovMatcher performs well.

      Weaknesses:

      There are two weaknesses.

      (1) When the number of features is reduced the precision drops to ~0.8.

      We would like to clarify that this drop in precision occurs in the challenging setting where only a small proportion of metabolites are shared between both datasets (e.g., the overlap – or proportion of shared features - was 25% in our simulation study). When two untargeted metabolic datasets share only 25% of their features, this is a challenging setting for any automated matching method as the vast majority 75% of the features in both datasets must remain unmatched.

      In such settings, the reviewer correctly observes that the precision of GromovMatcher algorithms (GM and GMT) drops within the range of 0.80 - 0.85 (Figure 3b, top left panel). Such a precision of 0.8 or larger is still competitive compared with the alternative methods MetabCombiner (mC) and M2S whose precisions drop below 0.8 (see main text Fig. 3b, top left panel).

      Precision is measured as the number of metabolite pairs correctly matched divided by all matches identified by a method. In other words, even in the challenging setting when the number of shared features (true matches) between both datasets is small (e.g. low 25% overlap), upwards of 80% of the feature matches found by GromovMatcher are correct which is a very encouraging result.

      (2) How applicable is the method for other non-human datasets?

      We thank the reviewer for raising this question. The crux of the matter concerning the application to animal data revolves around the hypothesis that correlations between metabolites in two different studies are preserved. Theoretically, the metabolome operates under similar principles in humans, governed by an underlying network of biochemical reactions. Consequently, in comparable human populations, the GM hypothesis is likely to hold to some extent.

      However, in practice, application to animal data is more complicated. Animal studies tend to have smaller sample sizes and often stem from intervention-driven scenarios, such as mice subjected to specific diets or chemicals. This results in deliberate alterations in metabolic structures which makes finding two comparable animal studies less likely. To investigate the reviewer’s question, we have searched through the two predominant LC-MS dataset repositories (MetaboLights and NIH Metabolomics Workbench) but did not find any pairs of comparable animal studies due to the reasons mentioned above. One potential strategy to navigate this issue could entail regressing the metabolic intensities against the variables that notably differ between the two animal populations and running GM using the residual intensities. This would be an interesting direction for future research and additional validation would be needed to test the robustness of GM in this setting.

      Reviewer #2 (Public Review):

      Summary:

      The goal of untargeted metabolomics is to identify differences between metabolomes of different biological samples. Untargeted metabolomics identifies features with specific mass-to-charge ratio (m/z) and retention time (RT). Matching those to specific metabolites based on the model compounds from databases is laborious and not always possible, which is why methods for comparing samples on the level of unmatched features are crucial.

      The main purpose of the GromovMatcher method presented here is to merge and compare untargeted metabolomes from different experiments. These larger datasets could then be used to advance biological analyses, for example, for the identification of metabolic disease markers. The main problem that complicates merging different experiments is m/z and RT vary slightly for the same feature (metabolite).

      The main idea behind the GromovMatcher is built on the assumption that if two features match between two datasets (that feature I from dataset 1 matches feature j from dataset 2, and feature k from dataset 1 matches feature l from dataset 2), then the correlations or distances between the two features within each of the datasets (i and k, and j and l) will be similar. The authors then use the Gromov-Wasserstein method to find the best matches matrix from these data.

      The variation in m/z between the same features in different experiments is a user-defined value and it is initially set to 0.01 ppm. There is no clear limit for RT deviations, so the method estimates a non-linear deviation (drift) of RT between two studies. GromovMatcher estimates the drift between the two studies and then discards the matching pairs where the drift would deviate significantly from the estimate. It learns the drift from a weighted spline regression.

      The authors validate the’performance of their GromovMatcher method by a validation experiment using a dataset of cord blood. They use 20 different splits and compare the GromovMatcher (both its GM and GMT iterations, whereby the GMT version uses the deviation from estimated RT drift to filter the matching matrix) with two other matching methods: M2S and metabCombiner.

      The second validation was done using a (scaled and centered) dataset of metabolics from cancer datasets from the EPIC cohort that was manually matched by an expert. This dataset was also used to show that using automatic methods can identify more features that are associated with a particular group of samples than what was found by manual matching. Specifically, the authors identify additional features connected to alcohol consumption.

      Strengths:

      I see the main strength of this work in its combination of all levels of information (m/z, RT, and higher-order information on correlations between features) and using each of the types of information in a way that is appropriate for the measure. The most innovative aspect is using the Gromov-Wasserstein method to match the features based on distance matrices.

      We thank the reviewer for acknowledging this strength of our proposed GromovMatcher method.

      The authors of the paper identify two main shortcomings with previously established methods that attempt to match features from different experiments: a) all other methods require fine-tuning of user-defined parameters, and, more importantly, b) do not consider correlations between features. The main strength of the GromovMatcher is that it incorporates the information on distances between the features (in addition to also using m/z and RT).

      Weaknesses:

      The first, minor, weakness I could identify is that there seem not to be plenty of manually curated datasets that could be used for validation.

      We thank the reviewer for raising this issue concerning manually curated validation data.

      Manually curated datasets available for validation purposes are indeed scarce. This stems from the laborious nature of matching features across diverse studies, hence the need for automatic matching methods. Our future strategy involves further validation of the GromovMatcher approach as more data becomes accessible in EPIC and other cohorts.

      The scarcity of real-life publicly available datasets that can be used for validation purpose is the reason why we conducted an extensive simulation study (main text “Validation on ground-truth data” and Appendix 3). It is notably thorough, arguably more comprehensive than previous validations, utilizes real-life untargeted data, and imitates situations where data originates from distinct untargeted metabolomics studies, complete with realistic noise parameters encompassing RT, mz, and feature intensities. Our validation study comprehensively explores the performance of GromovMatcher, M2S, and metabCombiner, including in challenging realistic settings where there is a nonlinear drift in retention times, varying levels of feature overlaps between studies, normalizations of feature intensities, as well as imbalances in the number of features and samples present in the studies being matched.

      The second is also emphasized by the authors in the discussion. Namely, the method as it is set up now can be directly used only to compare two datasets.

      This is indeed a limitation that is common to all three methods considered in this paper. However, all these methods, GromovMatcher, M2S, and metabCombiner, can still be used to compare and pool multiple datasets using a multi-step procedure. Namely, this can be done by designating a 'reference' dataset and aligning all studies to it one by one. We take this exact approach in our paper when aligning the CS, HCC, and PC studies of the EPIC data in positive mode (main text “Application to EPIC data”). Namely, the HCC and PC studies are both aligned to the CS study by running GromovMatcher twice, and after obtaining these matchings, our analysis is restricted to those features in HCC and PC that are present in the CS study.

      After the reviewer’s comment, we have added an additional sensitivity analysis in Appendix 5, to compare the results produced by GromovMatcher depending on the choice of the reference study. Namely, setting the reference study to either the CS study or the HCC study, GromovMatcher identified 706 and 708 common features respectively, with an overlap of 640 features. This highlights that the choice of reference does matter to some extent. In our original analysis of the EPIC data, choosing CS as the reference was motivated by the fact that CS had the largest sample size (compared to HCC and PC) and a subset of features in HCC and PC were already matched by experts to the CS study which we could use for validation (see Loftfield et al. (2021). J Natl Cancer Inst.).

      As mentioned in the discussion section of our manuscript, the recently proposed multimarginal Gromov-Wasserstein algorithm (Beier, F., Beinert, R., & Steidl, G. (2023). Information and Inference) could potentially allow multiple metabolomic studies to be matched using one optimization routine (e.g. without the designation of a ‘reference study’ for matching). We have not explored this possibility in depth yet as fast numerical methods for multimarginal GW are still in their infancy. Also, such multimarginal methods rely on the computation and storage of coupling or matching matrices that are tensors where the number of dimensions is equal to the number of datasets being matched. Therefore, multimarginal methods have large memory costs, which currently precludes their application for the matching of multiple metabolomics datasets.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was struggling with the representation used in Figure 3a. The gray points overlayed over the green points on a straight line are difficult to visually quantify. I found that my eyes mainly focused on the pattern of the red dots.

      Figure 3a has been modified to improve visual clarity. Namely we have consistently reordered the rows and columns of the coupling matrices such that the true positive matches (green points) are spatially separated from the false negative matches (red points). Now the fraction of true positive and false negative matches can be appreciated much more clearly by eye in Figure 3a.

      (2) I would also like to add the caveat that I cannot judge whether the authors used the other two methods that they compare with GromovMatcher (the M2S and metabCombiner) optimally. But I also do not see any evidence that they did not. Hopefully one of the other reviewers can address that.

      We appreciate the reviewer for highlighting the comparison of our approach GromovMatcher to the other existing methods M2S and MetabCombiner (mC). Both M2S and mC depend on tens of hyperparameters each with a discrete or continuous set of values that must be properly optimized to infer accurate matchings between dataset features. We detail in Appendix 2 how the hyperparameters of the M2S and mC methods are optimally tuned to achieve the best possible performance on the validation ground-truth data. Namely, both in the simulation study and on EPIC data, we grid-search over all important hyperparameters in the M2S and mC methods and choose those parameter combinations that result in the highest F1 score, averaged over 20 random trials. We remark that no such hyperparameter optimization was performed for our GromovMatcher method. As shown in Figures 3 and 4 of the main text, we find that GromovMatcher outperforms M2S and mC even in these cases when the hyperparameters of M2S and mC are tuned to predict optimal feature matchings.

      Given the large combinatorial space of hyperparameter choices, we believe we have thoroughly tested the important hyperparameter combinations that users of M2S and mC would be likely to explore in their own research.

      (3) Validation

      (3a) The first validation is done on a split cord blood dataset. I could not clearly see from the paper how sensitive the result is to the dataset split.

      We are grateful for the reviewer’s question and have included new experiments in Appendix 3 which show how the results of GromovMatcher, M2S, and MetabCombiner are affected by the dataset split. In our original manuscript, our validation ground-truth experiment began with an untargeted metabolomic dataset consisting of n = 499 samples and p = 4,712 metabolic features which is split equally into two datasets consisting of an equal number of samples n1 = n2 and an equal number of metabolic features p1 = p2. The features of these equal-sized datasets would then be matched by our method.

      Now in Appendix 3 (Figs. 1-3) we show the sensitivity of all three alignment methods (GromovMatcher, M2S, and MetabCombiner) when we vary the fraction of samples in dataset 1 over dataset 2 given by n1/ n2, the overlap in shared features between both datasets, and the fraction of metabolic features in dataset 1 that are not present in dataset 2 which affects the feature sizes of both datasets p1/ p2. We find that all alignment methods are able to maintain a consistent precision and recall score when these three dataset split parameters are varied. GromovMatcher achieves a higher precision and recall than M2S and MetabCombiner for all choices of dataset split, agreeing with the validation experiment results from the main text (see main text Fig. 3). All three methods tested decrease in precision (without dropping in recall) when dataset 1 and dataset 2 contain an equal number of unshared features (e.g. when p1 = p2). Therefore, these sensitivity experiments in Appendix 3 show that our results in the main text are performed in the most challenging setting for the dataset split.

      (3b) The second validation was done using a (scaled and centered) dataset of metabolics from cancer datasets from the EPIC cohort that was manually matched by an expert. Here the authors observe that metabCombiner has good precision, but lags in recall. And M2S has a very similar performance to GromovMatcher. The authors explain this by the fact that the drift in RT between the two experiments is mostly linear and thus does not affect the M2S performance. Can the authors find a different validation dataset where the drift in RT is not linear? If yes, it would be interesting to add it to the paper.

      We thank the reviewer for raising this question. As mentioned above, curated validation datasets such as the EPIC study analyzed in our paper are very rare and we do not currently have a validation study with a nonlinear retention time drift.

      Nevertheless, we performed an additional analysis of simulated data (reported in Appendix 2 – “M2S hyperparameter experiments” and Appendix 2 – Table 1) that demonstrates the decrease in M2S performance when the simulated drift is nonlinear. As presented in Appendix 2 – Table 1, in a low overlap setting with a linear drift which corresponds to the EPIC data, precision and recall were 0.831 and 0.934 respectively, instead of 0.769 and 0.905 in the main analysis where the drift was nonlinear.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study reports a novel mechanism linking DHODH inhibition-mediated pyrimidine nucleotide depletion to antigen presentation. Alternative means of inducing antigen presentation provide therapeutic opportunities to augment immune checkpoint blockade for cancer treatment. While the solid mechanistic data in vitro are compelling, in vivo assessments of the functional relevance of this mechanism are still incomplete.

      Public Reviews:

      We thank all Reviewers for their insightful comments and excellent suggestions.

      Reviewer #1 (Public Review):

      The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion.

      No comment from authors

      The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.

      No comment from authors

      The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study.

      No comment from authors for preceding text, comment addresses the following text

      Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways.

      We appreciate this comment, and we would like to explain why we did not pursue these approaches. According to DepMap, CRISPR/Cas9-mediated knockout of CDK9 in cancer cell lines is almost universally deleterious, scoring as “essential” in 99.8% (1093/1095) of all cell lines tested (see Author response image 1 below). This makes sense, as P-TEFb is required for productive RNA polymerase II elongation of most mammalian genes. As such, it was not feasible to generate cell lines with stable genetic knockout of CDK9 to test our hypothesis.

      While knockdown of CDK9 by RNA interference could support our results, DepMap data seems to indicate that RNAi-mediated knockdown of CDK9 is generally ineffective in silencing its activity, as this perturbation scored as “essential” in only 6.2% (44/710) of tested cell lines. This suggests that incomplete depletion of CDK9 will likely not be sufficient to block APP induction downstream of nucleotide depletion. Furthermore, RNAi-mediated depletion of CDK9 may trigger transcriptional changes in the cell by virtue of its many documented protein-protein interactions, and it would be difficult to establish a consistent “time zero” at which point CDK9 protein depletion is substantial but secondary effects of this have not yet occurred to a significant degree. These factors constitute major limitations of experiments using RNAi-mediated knockdown of CDK9.

      Author response image 1.

      Essentiality score from CRISPR and RNAi perturbation of CDK9 in cancer cell lines https://depmap.org/portal/gene/CDK9?tab=overview&dependency=RNAi_merged

      At any rate, we provide evidence that three different inhibitors of CDK9 (flavopiridol, dinaciclib, and AT7519) all inhibit our effect of interest (Fig 4B). The same results were observed using a previously validated CDK9-directed proteolysis targeting chimera (PROTAC2), and this was reversed by addition of excess pomalidomide (Fig 4C), which correlated with the presence/absence of CDK9 on western blot under the exact same conditions (Fig 4D).

      It is formally possible that all CDK9 inhibitors we tested are blocking BQ-mediated APP induction by some shared off-target mechanism (or perhaps by two or more different off-target mechanisms) AND this CDK9-independent target also happens to be degraded by PROTAC2. However, this would be an extraordinarily non-parsimonious explanation for our results, and so we contend that we have provided compelling evidence for the requirement of CDK9 for BQ-mediated APP induction.

      Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      We are intrigued by the results shown to us by Reviewer #1 in the linked preprint (Mishima et al 2022, https://doi.org/10.21203/rs.3.rs-2190326/v1). We have also observed in our unpublished data that very high concentrations of BQ (>150µM) cause loss of cell viability that is not rescued by uridine supplementation and that occurs even in DHODH knockout cells. This effect of high-dose BQ must be DHODH-independent. We also agree that Mishima et al provide compelling evidence that the ferroptosis-sensitizing effect of high-dose BQ treatment is due (at least in large part) to inhibition of FSP1.

      Although we showed that DHODH is strongly inhibited in tumor cells in vivo (Fig 5C), we did not directly measure the concentration of BQ in the tumor or plasma. Sykes et al (PMID: 27641501) found that the maximum plasma concentration (Cmax) for [BQ]free following a single IP administration in C57Bl6/J mice (15mg/kg) is approximately 3µM, while the Cmax for [BQ]total was around 215µM. Because polar drug molecules bound to serum proteins (predominantly albumin) are not available to bind other targets, [BQ]free is the relevant parameter.

      Given a Cmax for [BQ]free of 3µM and half-life of 12.0 hours, we estimate that the steady-state [BQ]free with daily IP injections at this dose is around 4µM. Since we used an administration schedule of 10mg/kg every 24 hours, we estimate that the steady-state plasma [BQ]free in our system was 2.67µM (assuming initial Cmax of 2µM and half-life of 12.0 hours).

      To derive an upper-bound estimate for the Cmax of [BQ]free over the 12-day treatment period (Fig 5A-D), we will use the observed data for 15mg/kg dose, and we will assume that 1) there is no clearance of BQ whatsoever and 2) that [BQ]free increases linearly with increasing [BQ]total. This yields a maximum free BQ concentration of 12 x 3 = 36µM.

      Therefore, we consider it very unlikely that plasma concentrations of free BQ in our experiment exceeded the lower limit of the ferroptosis-sensitizing dose range reported by Mishima et al. However, without direct pharmacokinetic analysis, we cannot say for sure what the maximal [BQ]free was under our experimental conditions.

      Reviewer #2 (Public Review):

      In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.

      The combination of DHODH inhibition with ICB reflects more of an additive response instead of a synergistic combination. Moreover, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. To confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition, the authors should examine whether depletion of immune cells can reduce the therapeutic efficacy of BQ in vivo.

      We concur with this assessment.

      Moreover, they should examine whether BQ treatment induces antigen presentation in non-malignant cells and APCs to determine the cancer specificity.

      Although we showed that this occurs in HEK-293T cells, we appreciate that this cell line is not representative of human cells of any organ system in vivo. So, we agree it is important to determine if DHODH inhibition induces antigen presentation in human tissues and professional antigen presenting cells, and this is an excellent focus for future studies.

      However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline (i.e. even in the absence of DHODH inhibitor treatment), since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.

      Finally, although the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level, only MHC-I is validated by flow cytometry given the importance of MHC-II expression on epithelial cancers, including melanoma, MHC-II should be validated as well.

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.

      Reviewer #3 (Public Review):

      Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group build on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increase in expression of genes involved in the antigen presentation pathway and MHC I after BQ treatment and they narrow the mechanism to be strictly pyrimidine and CDK9/P-TEFb dependent. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. This combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.

      [No comment from authors]

      Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the use of multiple controls across a panel of cell lines to exclude off-target effects and to confirm that effects are exclusively dependent on pyrimidine depletion. Overall, the authors do a thorough characterization of the mechanism that mediates MHC I upregulation using multiple strategies. Furthermore, the in vivo studies provide solid evidence for combining DHODH inhibitors with immune checkpoint blockade.

      No comment from authors

      However, despite the use of multiple cell lines, most experiments are only performed in one cell line, and it is hard to understand why particular gene sets, cell lines or time points are selected for each experiment. It would be beneficial to standardize experimental conditions and confirm the most relevant findings in multiple cell lines.

      We appreciate this comment, and we understand how the use of various cell lines may seem puzzling. We would like to explain how our cell line panel evolved over the course of the study. Our first indication that BQ caused APP upregulation came from transcriptomics experiments (Figs 1A-D, S1A) performed as part of a previous study investigating BQ resistance (Mullen et al, 2023 Cancer Letters). In that study, we used CFPAC-1 as a model for BQ sensitivity and S2-013 as a model for BQ resistance. We did RNA sequencing +/- BQ in these cell lines to look for gene expression patterns that might underlie resistance/sensitivity to BQ. When analyzing this data, we serendipitously discovered the APP/MHC phenomenon, which gave rise to the present study.

      Our next step was to extend these findings to cancer cell lines of other histologies, and we prioritized cell lines derived from common cancer types for which immunotherapy (specifically ICB) are clinically approved. This is why A549 (lung adenocarcinoma), HCT116 (colorectal adenocarcinoma), A375 (cutaneous melanoma), and MDA-MB-231 (triple-negative breast cancer) cell lines were introduced.

      Because PDAC is considered to have an especially “immune-cold” tumor microenvironment, we reasoned that even dramatically increasing cancer cell antigen presentation may be insufficient to elicit an effective anti-tumor immune response in vivo. So we shifted our focus towards melanoma, because a subset of melanoma patients is very responsive to ICB and loss of antigen presentation (by direct silencing or homozygous loss-of-function mutations in MHC-I components such as B2M, or by functional loss of IFN-JAK1/2-STAT signaling) has been shown to mediate ICB resistance in human melanoma patients. This is why we extended our findings to B16F10 murine melanoma cells, intending to use them for in vivo studies with syngeneic immunocompetent recipient mice.

      The PDAC cell line MiaPaCa2 was introduced because a collaborator at our institution (Amar Natarajan) happened to have IKK2 knockout MiaPaCa2 cells, which allowed us to genetically validate our inhibitor results showing that IKK1 and IKK2 (crucial effectors for NF-kB signaling) are dispensable for our effect of interest.

      Ultimately, realizing that our results spanned various human and murine cell lines, we chose to use HEK-293T cells to validate the general applicability of our findings to proliferating cells in 2D culture, since HEK-293T cells (compared to our cancer cell lines) have relatively few genetic idiosyncrasies and express MHC-I at baseline.

      The differential in vivo survival depending on dosing schedule is interesting. However, this section could be strengthened with a more thorough evaluation of the tumors at endpoint.

      Overall, this is an interesting manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic strategy combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these inhibitors have not shown optimal efficacy yet.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The main issue is that it did not directly examine whether the increased antigen presentation by DHODH inhibition contributed to the potentiation of the efficacy of immune-check blockade (ICB). The additional effect of BQ in the xenograft tumor study was not examined to determine if it was due to increased antigen presentation toward the cancer cells or due to merely cell cycle arrest effect by pyrimidine depletion in the tumor cells. The different administration timing of ICB with BQ treatment (Fig 5E) would not be sufficient to answer this issue.

      We agree with this assessment and, and we believe the experiment proposed by Reviewer #2 below (comparing the efficacy of BQ in Rag-null versus immunocompetent recipients) would address this question directly. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Additionally, in the in vivo study, the increase in surface MHC1 in the protein level in by BQ treatment was not examined in the tumor samples, and it was not confirmed whether increased antigen presentation by BQ treatment actually promoted an anti-cancer immune response in immune cells. To support the story presented in the study, these data would be necessary.

      We attempted to show this by immunohistochemistry, but unfortunately the anti-H2-Db antibody that we obtained for this purpose did not have satisfactory performance to assess this in our tissue samples harvested at necropsy.

      (3) The mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. In general, results only by the inhibitor assay have a limitation of off-target effects.

      Please see our above reply to Reviewer #1 comment making this same point, where we spell out our rationale for not pursuing these experiments.

      (4) High concentrations of BQ (> 50 uM) have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, an iron-mediated lipid peroxidation-dependent cell death, independent of DHODH inhibition (https://www.researchsquare.com/article/rs-2190326/v1). It would be needed to discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      Please see our above reply to Reviewer #1 comment making this same point, where we explain why we are very confident that the BQ dose administered in our animal experiments was far below the minimum reported BQ dose required to sensitize cancer cells to ferroptosis in vitro.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.

      We agree with this assessment and the proposed experiment comparing the response in Rag-null versus immunocompetent recipients. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.

      We agree it is important to examine if our findings regarding nucleotide depletion and antigen presentation are true of APCs and other non-transformed cells, but we are not so concerned about the possibility of raising an immune response against non-malignant host tissues, as explained above. We have reproduced the relevant section below:

      “However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline, since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.”

      (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. This result seems to contradict the title.

      We do not agree with this assessment. Given that the effect of dual ICB alone was very marginal, while the effect of BQ monotherapy was quite marked, we cannot conclude from Fig 5 that BQ treatment inhibited ICB efficacy due to immune suppression.

      (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.

      We agree that the alternative explanation proposed by Reviewer #2 is possible and we appreciate the suggestion to test the stability of APP induction after stopping BQ treatment.

      (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Minor Points

      (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.

      We appreciate this suggestion. We have changed Fig 4 by replacing the HLA-B track (old Fig 4E) with a representation of fold change (Ter/DMSO) in Pol II occupancy versus fold change (Ter/DMSO) in mRNA abundance for 23 relevant genes (new Fig 4G); both of these datasets were obtained from the Tan et al manuscript. This new figure panel (Fig 4G) also shows linear regression analysis demonstrating that Pol II occupancy and mRNA expression are significantly correlated for APP genes. While we recognize that this data in itself is not formal proof of our hypothesis, it does strongly support the notion that increased transcription is responsible for the increased mRNA abundance of APP genes that we have observed.

      (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.

      We fully agree with this comment. While the abundance of cancer cell surface MHC-I is an important factor for anticancer immunity, another crucial factor is the identity of peptides that are presented. Treatments that cause presentation of more immunogenic peptides can enhance T-cell recognition even in the absence of a relative change in cell surface MHC-I abundance.

      While we did not perform the immunopeptidomics experiments described, we can offer some speculation regarding this comment. As shown in Fig 1D-E, transcriptomics experiments suggest that immunoproteasome subunits (PSMB8, PSMB9, PSMB10) are upregulated upon DHODH inhibition. If this change in mRNA levels translates into greater immunoproteasome activity (which was not tested in our study), this would be expected to alter the repertoire of peptides available for presentation and could thereby change the immunopeptidome.

      However, this hypothesis requires direct testing, and we hope future studies will delineate the effects of DHODH inhibition and other cancer therapies on the immunopeptidome, as this area of research will have important clinical implications.

      (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.

      We agree with the assessment. A mechanistic link between depleted pyrimidines and CDK9 activity will be a subject of future studies.

      (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.

      Reviewer #1 raised this very same point, and we agree. Please see our reply to Reviewer #1, which details why we did not pursue this approach and argues that the evidence we present is compelling even in absence of genetic manipulation.

      Additionally, please see the new Fig 4E and 4F, which is a repeat of Fig 4B using HCT116 cells. Figure 4E shows that, in this cell line, CDK9 inhibitors (flavopiridol, dinaciclib, and AT7519) block BQ-mediated APP induction, while PROTAC2 does not. Figure 4F shows that (for reasons we cannot fully explain) PROTAC2 does not lead to CDK9 degradation in HCT116 cells. This data strongly implicates CDK9, because it excludes a CDK9-degradation-independent effect of PROTAC2.

      (5) Figure 2B needs a legend.

      Thank you for pointing this out. We have added a legend to Fig 2B.

      (6) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.

      Thank you for this suggestion! We have amended the Discussion section to mention this important point. Please see paragraph 2 of the revised Discussion section where we have added the following text:

      “Because BQ-mediated APP induction does not require interferon signaling, this strategy may have particular relevance for clinical scenarios in which tumor antigen presentation is dampened by the loss or silencing of cancer cell interferon signaling, which has been demonstrated to confer both intrinsic and acquired ICB resistance in human melanoma patients.”

      Reviewer #3 (Recommendations For The Authors):

      The authors present convincing evidence of the mechanism by which pyrimidine nucleotides regulate MHC I levels and about the potential of combining DHODH inhibitors with dual immune checkpoint blockade (ICB). This is an interesting paper given the clinical relevance of DHODH inhibitors. The studies raise some questions, and some points might need clarifying as below:

      • In Figure 2C, why do the authors focus on these two genes in the uridine rescue? These are important genes mediating antigen presentation, but it might be more interesting to see how H2-Db and H2-Kb expression correlate with the protein data shown in Fig 2D. Fig. 2C-2D is a relevant control, so it would be important to validate in a different cancer cell line (e.g. one of the PDAC cell lines used for the RNAseq).

      We appreciate this comment. Although Fig 3C shows that BQ-induced expression of H2-Db, H2-Kb, and B2m is reversed by uridine (in B16F10 cells), we recognize that this was not the best placement for this data, as it can easily be overlooked here since uridine reversal is not the main point of Fig 3C. We have left Fig 3C as is, because we think that the uridine reversal demonstrated in that panel serves as a good internal positive control for reversal of BQ-mediated APP induction in that experiment.

      We have repeated the experiments shown in the original Fig 2C and substituted the original Fig 2C with a new Fig 2C and Fig S2B, which show both Tap1 and Nlrc5 as well as H2-Db, H2-Kb, and B2m after treatment with either BQ (new Fig 2C) or teriflunomide (new Fig S2B). The original Fig S2B is now Fig S2C, and it shows that uridine has no effect on the expression of any of the genes assayed in the new Fig 2C or S2B.

      The reversibility of cell surface MHC-I induction was also validated in HCT116 cells (Fig 3F). We included the uridine reversal in Fig 3F to avoid duplicating the control and BQ FACS data in multiple panels.

      We have also added the qPCR data for HCT116 cells showing this same phenotype (at the mRNA level), which is the new Fig S2D.

      We decided to prioritize HCT116 cells for our mechanistic studies (Figures S2D, S4A, and 4E-F) because previous reports indicate that it is diploid and therefore less genetically deranged compared to our other cancer cell lines.

      • Figure 2F shows an elegant experiment to discard off-target effects related to cell death and to confirm that the increased MHC I expression is uniquely dependent on pyrimidines. DHODH has recently been involved in ferroptosis, a highly immunogenic type of cell death. What are the authors´ thoughts on BQ-induced ferroptosis as a possible contributor to the effects of ICB? Does BQ + ferroptosis inhibitor (ferrostatin) affect cell surface MHC I and/or expression of antigen processing genes?

      The potential role of DHODH in ferroptosis protection (Mao et al 2021) has important implications, so we are glad that multiple reviewers raised questions concerning ferroptosis. We did not directly test the effect of ferroptosis inducing agents (with or without BQ) on MHC-I/APP expression, but that is certainly a worthwhile line of investigation.

      The DHODH/ferroptosis issue is complicated by a study pointed out by Reviewer #1 that challenges the role of DHODH inhibition in BQ-mediated ferroptosis sensitization (Mishima et al, 2022). This study argues that high-dose BQ treatment causes FSP1 inhibition, and this underlies the effect of BQ on the cellular response to ferroptosis-inducing agents.

      Regardless of whether BQ-induced ferroptosis-sensitization is dependent on DHODH, FSP1, or some other factor, the Mao and Mishima studies agree that a relatively high dose of BQ is required to observe these effects (100-200µM for most cell lines and >50µM even in the most ferroptosis-sensitive cell lines). As we explained above, we consider it very unlikely that the in vivo BQ exposure in our experiments (Fig 5) was high enough to cause significant ferroptosis, especially in the absence of any dedicated ferroptosis-inducing agent (which is typically required to cause ferroptosis even in the presence of high-dose BQ).

      • The authors nail down the mechanism to CDK9 (Fig 4). However, all these experiments are performed in 293T cells. I would like to see a repeat of Fig. 4B in a cancer cell line (either PDAC or B16). Also, does BQ have any effect on CDK9 expression/protein levels?

      We have added two figure panels that address this comment (new Fig 4E and 4F). Figure 4E (which is a repeat of Fig 4B with HCT116 cells) shows that CDK9 inhibitors (flavopiridol, AT7519, and dinaciclib) reverse BQ-mediated APP induction in HCT116 cells (this agrees with Fig S4A showing that flavopiridol reverses MHC induction by various nucleotide synthesis inhibitors in this cell line), but PROTAC2 does not. Figure 4F shows that PROTAC2 (for reasons we cannot explain) does not cause CDK9 degradation in HCT116 cells. This adds further support to our thesis that CDK9 is a critical mediator of BQ-mediated APP induction (because how else can this pattern of results be explained?). The text of the Results section has been amended to reflect this.

      We chose to use HCT116 cells for this repeat experiment 1) to align with Fig S4A and 2) because, as previously mentioned, we consider HCT116 to be a good cell line for mechanistic studies because of its relative lack of idiosyncratic genetic features (compared to CFPAC-1, for example, which was derived from a patient with cystic fibrosis).

      • What are the differences in tumor size for the experiment shown in Figure 5E? What about tumor cell death in the ICB vs. BQ+ICB groups?

      Because this was a survival assay, direct comparisons of tumor volumes between groups was not possible at later time points, since mice that die or have to be euthanized are removed from their experimental group, which lowers the average group tumor burden at subsequent time points. Although tumor volume was the most common euthanasia criteria reached, a subset of mice were either found dead or had to be euthanized for other reasons attributed to their tumor burden (moribund state, inability to ambulate or stand, persistent bleeding from tumor ulceration, severe loss of body mass, etc.). This confounds any comparison of endpoint measurements (such as immunohistochemical quantification of tumor cell death markers, T-cell markers, etc.).

      • The different response in the concurrent vs delayed treatment is very interesting. The authors suggest two possible mechanisms to explain this: "1) Concurrent BQ dampens the initial anticancer immune response generated by dual ICB, or b) cancer cell MHC-I and related genes are not maximally upregulated at the time of ICB administration with concurrent treatment". However, and despite the caveat of comparing the in vitro to the in vivo setting, Fig 2D shows upregulation of MHC I already at 24h of treatment in B16 cells. Have the authors checked T cell infiltration in the concurrent and delayed treatment setting?

      For the same reasons described in response to the preceding comment, tumors harvested upon mouse death/euthanasia from our survival experiment were not suitable for cross-cohort comparison of tumor endpoint measurements. An additional experiment in which mice are necropsied at a prespecified time point (before any mice have died or reached euthanasia criteria, as in the experiment for Fig 5A-D) would be required to answer this question.

      • Page 5, line 181 -do the authors mean "nucleotide salvage inhibitors" instead of "synthesis"?

      We believe the reviewer is referring to the following sentence:

      “The other drugs screened included nucleotide synthesis inhibitors (5-fluorouracil, methotrexate, gemcitabine, and hydroxyurea), DNA damage inducers (oxaliplatin, irinotecan, and cytarabine), a microtubule targeting drug (paclitaxel), a DNA methylation inhibitor (azacytidine), and other small molecule inhibitors (Fig 2F).”

      In this context, we believe our use of “synthesis” instead of “salvage” is correct, because methotrexate and 5-FU inhibit thymidylate synthase (which mediates de novo dTTP synthesis), while gemcitabine and hydroxyurea inhibit ribonucleotide reductase (which mediates de novo synthesis of all dNTPs).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study, utilizing CITE-Seq to explore CML, is considered a useful contribution to our understanding of treatment response. However, the reviewers express concern about the incomplete evidence due to the small sample size and recommend addressing these limitations. Strengthening the study with additional patient samples and validation measures would enhance its significance.

      We thank the editors for the assessment of our manuscript. In view of the comments of the three reviewers, we have increased the number of CML patient samples analyzed to confirm all the major findings included in the manuscript. In total, more than 80 patient samples across different approaches have now been analyzed and incorporated in the revised manuscript.

      To the best of our knowledge, this is the first single cell multiomics report in CML and differs substantially from the recent single cell omics-based reports where single modalities were measured one at a time (Krishnan et al., 2023; Patel et al., 2022). Thus, the sc-multiomic investigation of LSCs and HSCs from the same patient addresses a major gap in the field towards managing efficacy and toxicity of TKI treatment by enumerating CD26+CD35- LSCs and CD26-CD35+ HSCs burden and their ratio at diagnosis vs. 3 months of therapy. The findings suggest design of a simpler and cheaper FACS assay to simultaneously stratify CML patients for TKI efficacy as well as hematologic toxicity.

      Reviewer 1:

      Summary:

      This manuscript by Warfvinge et al. reports the results of CITE-seq to generate singlecell multi-omics maps from BM CD34+ and CD34+CD38- cells from nine CML patients at diagnosis. Patients were retrospectively stratified by molecular response after 12 months of TKI therapy using European Leukemia Net (ELN) recommendations. They demonstrate heterogeneity of stem and progenitor cell composition at diagnosis, and show that compared to optimal responders, patients with treatment failure after 12 months of therapy demonstrate increased frequency of molecularly defined primitive cells at diagnosis. These results were validated by deconvolution of an independent previously published dataset of bulk transcriptomes from 59 CML patients. They further applied a BCR-ABL-associated gene signature to classify primitive Lin-CD34+CD38- stem cells as BCR:ABL+ and BCR:ABL-. They identified variability in the ratio of leukemic to non-leukemic primitive cells between patients, showed differences in the expression of cell surface markers, and determined that a combination of CD26 and CD35 cell surface markers could be used to prospectively isolate the two populations. The relative proportion of CD26-CD35+ (BCR:ABL-) primitive stem cells was higher in optimal responders compared to treatment failures, both at diagnosis and following 3 months of TKI therapy.

      Strengths:

      The studies are carefully conducted and the results are very clearly presented. The data generated will be a valuable resource for further studies. The strengths of this study are the application of single-cell multi-omics using CITE-Seq to study individual variations in stem and progenitor clusters at diagnosis that are associated with good versus poor outcomes in response to TKI treatment. These results were confirmed by deconvolution of a historical bulk RNAseq data set. Moreover, they are also consistent with a recent report from Krishnan et al. and are a useful confirmation of those results. The major new contribution of this study is the use of gene expression profiles to distinguish BCRABL+ and BCR-ABL- populations within CML primitive stem cell clusters and then applying antibody-derived tag (ADT) data to define molecularly identified BCR:ABL+ and BCR-ABL- primitive cells by expression of surface markers. This approach allowed them to show an association between the ratio of BCR-ABL+ vs BCR-ABL- primitive cells and TKI response and study dynamic changes in these populations following short-term TKI treatment.

      Weaknesses:

      One of the limitations of the study is the small number of samples employed, which is insufficient to make associations with outcomes with confidence. Although the authors discuss the potential heterogeneity of primitive stem, they do not directly address the heterogeneity of hematopoietic potential or response to TKI treatment in the results presented. Another limitation is that the BCR-ABL + versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. The BCR-ABL status of cells sorted based on CD26 and CD35 was evaluated in only two samples. We also note that the surface markers identified were previously reported by the same authors using different single-cell approaches, which limits the novelty of the findings. It will be important to determine whether the GEP and surface markers identified here are able to distinguish BCR-ABL+ and BCR-ABL- primitive stem cells later in the course of TKI treatment. Finally, although the authors do describe differential gene expression between CML and normal, BCR:ABL+ and BCR:ABL-, primitive stem cells they have not as yet taken the opportunity to use these findings to address questions regarding biological mechanisms related to CML LSC that impact on TKI response and outcomes.

      Reviewer #1 (Recommendations For The Authors):

      Minor comment: Fig 4 legend -E and F should be C and D.

      We thank the reviewer for positive assessment of our work. Here, we highlight the updates in the revised manuscript considering the feedback received.

      Minor comment: Fig 4 legend -E and F should be C and D.

      We have edited the revised manuscript accordingly

      One of the limitations of the study is the small number of samples employed, which is insufficient to make associations with outcomes with confidence.

      Although we performed CITE-seq for 9 CML patient samples at diagnosis, we extended our investigations to include additional samples (e.g., largescale deconvolution analysis of samples, Fig 3 C-E, qPCR for BCR::ABL1 status, Fig. 6A, and the ratio between CD35+ and CD26+ populations at diagnosis and during TKI therapy, Fig. 6C-D) as described in the manuscript.

      In comparison to a scRNA-seq, multiomic CITE-seq involves preparation and sequencing of separate libraries corresponding to RNA and ADTs thereby being even more resource demanding limiting our capacity to process an extensive number of patient samples. To confirm our findings in a larger cohort we have therefore adopted a computational deconvolution approach, CIBERSORT to analyze a larger number of independent samples (n=59). This reflects a growing, sustainable trend to study larger number of patients in face of still prohibitively expensive but potentially insightful scomics approaches (For example, please see Zeng et al, A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia, Nature Medicine, 2022).

      However, in view of the comment, we have now substantially increased the number of analyzed patients in the revised manuscript. These include increased number of patient samples to investigate the ratio between CD35 and CD26 marked populations at diagnosis, and 3 months of TKI therapy (from n=8 to n=12 with now 6 optimal responders and 5 treatment failure at diagnosis and after TKI therapy), qPCR for BCR::ABL1 expression status at diagnosis (from n=3 to n=9) , and followed up the BCR::ABL1 expression in three additional samples after TKI therapy. Moreover, we examined the CD26 and CD35 marked populations for expression of GAS2, one of our top candidate LSC signature genes in three additional samples at diagnosis and at 3m follow up. Thus, >80 patient samples across different approaches have been analyzed to strengthen all major conclusions of the study.

      We emphasize that we were cautious in generalizing the observation obtained from any one approach and sought to confirm any major finding using at least one complementary method. As an example, although CITE-seq (n=9) showed altered frequency of all cell clusters between optimal and poor responders (Fig. 3B), we refrained from generalizing because our independent large-scale computational deconvolution analysis (n=59) only substantiated the altered proportion of primitive and myeloid cell clusters (Fig. 3E).

      Although the authors discuss the potential heterogeneity of primitive stem, they do not directly address the heterogeneity of hematopoietic potential or response to TKI treatment in the results presented.

      Thanks for noting the discussion on heterogeneity of the primitive stem cells. As described in the original manuscript, the figure 6 D-E showed a relationship between heterogeneity and TKI therapy response. The results showed that CD35+/CD26+ ratio within the HSC fraction associated with this therapy response. We have now increased the number of patient samples analyzed and present the updated results in the revised manuscript (now figure 6 C-D). These observations set the stage for assessing whether long term therapy outcome can also be influenced by heterogeneity at diagnosis.

      We have shown the hematopoietic potential of HSCs marked by CD35 expression in an independent parallel study and therefore only mentioned it concisely in the current manuscript. A combination of scRNA-seq, scATAC-seq and cell surface proteomics showed CD35+ cells at the apex of healthy human hematopoiesis, containing an HSCspecific epigenetic signature and molecular program, as well as possessing self-renewal capacity and multilineage reconstitution in vivo and vitro. The preprint is available as Sommarin et al. ‘Single-cell multiomics reveals distinct cell states at the top of the human hematopoietic hierarchy’, Biorxiv; https://www.biorxiv.org/content/10.1101/2021.04.01.437998v2.full

      We also note that the surface markers identified were previously reported by the same authors using different single-cell approaches, which limits the novelty of the findings.

      Our current manuscript is indeed a continuation of and builds onto our previous paper (Warfvinge R et al. Blood, 2017). In contrast to our previous report which was limited to examination of only 96 genes per cell, CITE-seq allowed us to examine the molecular program of cells using unbiased global gene expression profiling. Finally, although CD26 appears, once again as a reliable marker of BCR::ABL1+ primitive cells, CD35 emerges as a novel and previously undescribed marker of BCR::ABL1- residual stem cells. A combination of CD35 and CD26 allowed us to efficiently distinguish between the two populations housed within the Lin-34+38/low stem cell immunophenotype.

      Another limitation is that the BCR-ABL + versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. The BCR-ABL status of cells sorted based on CD26 and CD35 was evaluated in only two samples

      Single cell detection of fusion transcripts is challenging with low detection sensitivity in single cell RNA-seq as has been noted previously (Krishnan et al. Blood, 2023, Giustacchini et al. Nature Medicine, 2017, Rodriguez-Meira et al. Molecular Cell, 2019). However, this is likely to change with the inclusion of targetspecific probes in scRNA-seq library preparation protocols. Nonetheless, in view of the comment, we have included more patient samples (from the previous n=3 to current n=10 (including TKI treated samples) for direct assessment of BCR-ABL1 status by qPCR analysis; the updated results are included in the revised manuscript (Figure 6A).

      It will be important to determine whether the GEP and surface markers identified here are able to distinguish BCR-ABL+ and BCR-ABL- primitive stem cells later in the course of TKI treatment.

      We performed qPCR to check for BCR::ABL1 status, and the level of GAS2, one of the top genes expressed in CML cells within CD26+ and CD35+ cells at diagnosis and following 3 months of TKI therapy. The results showed that while CD26+ are BCR::ABL1+, the CD35+ cells are BCR::ABL1- at both time points. Moreover, the expression of LSC-specific gene, GAS2 was specific to BCR::ABL1+ CD26+ cells at both diagnosis as well as following 3 months of TKI therapy. The new results are presented in figure 6B in the revised manuscript.

      Finally, although the authors do describe differential gene expression between CML and normal, BCR:ABL+ and BCR:ABL-, primitive stem cells they have not as yet taken the opportunity to use these findings to address questions regarding biological mechanisms related to CML LSC that impact on TKI response and outcomes.

      We agree with the reviewer that our major focus here was to characterize the cellular heterogeneity coupled to treatment outcome and therefore we did not delve deep into the molecular mechanisms underlying TKI response. However, in response to this comment, as mentioned above, we noted that one of the top genes in BCR::ABL1 cells (Fig. 4 C; right; in red), GAS2 (Growth Specific Arrest 2) was expressed at both diagnosis and TKI therapy within CD26+ cells relative to CD35+ cells (updated figure 6B). Interestingly, GAS2 was also detected in CML LSCs in a recent scRNA-seq study (Krishnan et al. Blood, 2023) suggesting GAS2 upregulation could be a consistent molecular feature of CML cells. GAS2 has been previously noted as deregulated in CML (Janssen JJ et al. Leukemia, 2005, Radich J et al, PNAS, 2006), control of cell cycle, apoptosis, and response to Imatinib (Zhou et al. PLoS One, 2014). Future investigations are warranted to assess whether GAS2 could play a role in the outcome of long-term TKI therapy.

      Reviewer 2:

      Summary:

      The authors use single-cell "multi-comics" to study clonal heterogeneity in chronic myeloid leukemia (CML) and its impact on treatment response and resistance. Their main results suggest 1) Cell compartments and gene expression signatures both shared in CML cells (versus normal), yet 2) some heterogeneity of multiomic mapping correlated with ELN treatment response; 3) further definition of s unique combination of CD26 and CD35 surface markers associated with gene expression defined BCR::ABL1+ LSCs and BCR::ABL1- HSCs. The manuscript is well-written, and the method and figures are clear and informative. The results fit the expanding view of cancer and its therapy as a complex Darwinian exercise of clonal heterogeneity and the selective pressures of treatments.

      Strengths:

      Cutting-edge technology by one of the expert groups of single-cell 'comics.

      Weaknesses:

      Very small sample sizes, without a validation set. The obvious main problem with the study is that an enormous amount of results and conjecture arise from a very small data set: only nine cases for the treatment response section (three in each of the ELN categories), only two normal marrows, and only two patient cases for the division kinetic studies. Thus, it is very difficult to know the "noise" in the system - the stability of clusters and gene expression and the normal variation one might expect, versus patterns that may be reproducibly study artifact, effects of gene expression from freezing-thawing, time on the bench, antibody labeling, etc. This is not so much a criticism as a statement of reality: these elegant experiments are difficult, timeconsuming, and very expensive. Thus in the Discussion, it would be helpful for the authors to just frankly lay out these limitations for the reader to consider. Also in the Discussion, it would be interesting for the authors to consider what's next: what type of validation would be needed to make these studies translatable to the clinic? Is there a clever way to use these data to design a faster/cheaper assay?

      We thank the reviewer for appraisal of our manuscript. We take the opportunity to point out the updates in the revised manuscript in view of the comments.

      Very small sample sizes, without a validation set. The obvious main problem with the study is that an enormous amount of results and conjecture arise from a very small data set: only nine cases for the treatment response section (three in each of the ELN categories), only two normal marrows, and only two patient cases for the division kinetic studies.

      As the reviewer has noted the single cell omics experiments remain resource demanding thereby placing a limitation on the number of patients analyzed. As described above in response to the comments from reviewer 1, multiomic CITE-seq allows extraction of two modalities in comparison to a typical scRNA-seq, however, this also makes it even more limited in the number of samples processed in a sustainable way. This was one of the motivations to analyze a larger number of independent samples (n=59) while benefiting from the insights gained from CITE-seq (n=9). Furthermore, by analyzing CD34+ cells from bone marrow and peripheral blood of CML patients, including both responders and non-responders after one year of Imatinib therapy, we were able to significantly diversity the patient pool, which was lacking in our CITE-seq patient pool. As mentioned above, this reflects a growing trend to analyze larger number of patients while anchoring the analysis on prohibitively expensive but potentially insightful sc-omics approaches (For example, please see Zeng et al, A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia, Nature Medicine, 2022).

      As emphasized above, we frequently sought to confirm the findings from one approach using a complementary method and independent samples. For example, although CITE-seq (n=9) showed altered frequency of all cell clusters between optimal and poor responders (Fig. 3B), we refrained from generalizing because an independent largescale computational deconvolution analysis (n=59) only substantiated the altered proportion of primitive and myeloid clusters.

      In view of the comment, we have now increased the number of patients analyzed during the revision process. These include increased numbers to investigate the ratio between CD35+ and CD26+ populations at diagnosis, as well as 3 months of TKI therapy, qPCR for BCR::ABL1, and patients examined for GAS2, one of the top genes expressed in CML cells (see response to reviewer 1 for details). Altogether, >80 patient samples across different approaches were analyzed to strengthen the conclusions.

      During the revision, we have analyzed cells from 8 CML patients for cell cycle using gene activity scores. This is in addition to the cell division kinetics data reported previously are now together described in the supplementary figures 9C-F.

      It is very difficult to know the "noise" in the system - the stability of clusters and gene expression and the normal variation one might expect, versus patterns that may be reproducibly study artifact, effects of gene expression from freezing-thawing, time on the bench, antibody labeling, etc. This is not so much a criticism as a statement of reality: these elegant experiments are difficult, time-consuming, and very expensive. Thus in the Discussion, it would be helpful for the authors to just frankly lay out these limitations for the reader to consider.

      We agree with the reviewer that sc-omics approaches can be noisy despite continuing efforts to denoise single cell datasets through both experimental and bioinformatic innovations. Therefore, we have updated the discussion as recommended by the reviewer (paragraph 5 in the discussion).

      We also note that CITE-seq, in contrast to scRNA-seq alone provides dual features: surface marker/protein as well as RNA for annotating the same cluster. In our manuscript, for example, cell clusters in UMAP for normal BM; Fig 1B were described using both surface markers (Fig. 1C) and RNA (Fig. 1D) making the cluster identity robust. To further elaborate this approach, a new supplementary figure 1C shows annotations of clusters using both RNA and surface markers.

      To potentially address the issue of stability of clusters and gene expression, we compared the marker genes for major clusters from nBM from this study (supplementary table 4, Warfvinge et al.) with those described recently in a scRNA-seq study by Krishnan et al. supplementary table 8, Blood, 2023 using Cell Radar, a tool that identifies and visualizes which hematopoietic cell types are enriched within a given gene set (description: https://github.com/KarlssonG/cellradar

      Direct link: https://karlssong.github.io/cellradar/). To compare, we used our in-house gene list for the major clusters as well as mapped the same number of top marker genes based on log2FC from corresponding cluster from Krishnan et al. as inputs to Cell Radar. The Cell Radar plot outputs are shown below.

      Author response image 1.

      This approach showed broad similarities across clusters from this study with their counterparts from the other study suggesting the cluster identities reported here are likely to be robust. Please note these figures are for reviewer response only and not included in the final manuscript.

      Also in the Discussion, it would be interesting for the authors to consider what's next: what type of validation would be needed to make these studies translatable to the clinic? Is there a clever way to use these data to design a faster/cheaper assay?

      Our findings on CD26+ and CD35+ surface markers to enrich BCR::ABL1+ and BCR::ABL1- cells suggest a simpler, faster and cheaper FACS panel can possibly quantify leukemic and non-leukemic stem cells in CML patients. We anticipate that future investigations, clinical studies might examine whether CD26CD35+ cells could be plausible candidates for restoring normal hematopoiesis once the TKI therapy diminishes the leukemic load, and whether patients with low counts of CD35+ cells at diagnosis have a relatively higher chance of developing hematologic toxicity such as cytopenia during therapy.

      We briefly mentioned this possibility in the discussion; however, we have now moved it to another paragraph to highlight the same. Please see paragraph 5 in the revised manuscript.

      Reviewer 3:

      Summary:

      In this study, Warfvinge and colleagues use CITE-seq to interrogate how CML stem cells change between diagnosis and after one year of TKI therapy. This provides important insight into why some CML patients are "optimal responders" to TKI therapy while others experience treatment failure. CITE-seq in CML patients revealed several important findings. First, substantial cellular heterogeneity was observed at diagnosis, suggesting that this is a hallmark of CML. Further, patients who experienced treatment failure demonstrated increased numbers of primitive cells at diagnosis compared to optimal responders. This finding was validated in a bulk gene expression dataset from 59 CML patients, in which it was shown that the proportion of primitive cells versus lineage-primed cells correlates to treatment outcome. Even more importantly, because CITE-seq quantifies cell surface protein in addition to gene expression data, the authors were able to identify that BCR/ABL+ and BCR/ABL- CML stem cells express distinct cell surface markers (CD26+/CD35- and CD26-/CD35+, respectively). In optimal responders, BCR/ABL- CD26-/CD35+ CML stem cells were predominant, while the opposite was true in patients with treatment failure. Together, these findings represent a critical step forward for the CML field and may allow more informed development of CML therapies, as well as the ability to predict patient outcomes prior to treatment.

      Strengths:

      This is an important, beautifully written, well-referenced study that represents a fundamental advance in the CML field. The data are clean and compelling, demonstrating convincingly that optimal responders and patients with treatment failure display significant differences in the proportion of primitive cells at diagnosis, and the ratio of BCR-ABL+ versus negative LSCs. The finding that BCR/ABL+ versus negative LSCs display distinct surface markers is also key and will allow for a more detailed interrogation of these cell populations at a molecular level.

      Weaknesses:

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

      Reviewer #3 (Recommendations For The Authors):

      My only recommendation is to bolster findings with additional CML and healthy donor samples.

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

      We thank the reviewer for the positive assessment of our manuscript. As mentioned in response to comments from reviewer 1 and 2, CITE-seq remains an reource consuming single cell method potentially limiting the number of patients to be analyzed. However, during the revision process, we have increased the number of patient material analyzed for other assays; these include increased number to investigate the ratio between CD35+ and CD26+ populations at diagnosis, and 3 months of TKI therapy, qPCR for BCR::ABL1, and patients examined for GAS2, one of the top genes expressed in CML cells. Thus, >80 patient samples across different assays have been analyzed to strengthen the conclusions. (Please see comment to reviewer 1 for more details)

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank the reviewers for their thoughtful analysis and questions.

      A brief overview of the changes to the manuscript is provided here, with individual responses to the reviewer comments following.

      The methods section has been expanded to better explain the techniques used in our analyses. CTCF binding data section has likewise been expanded, to include more detail on the dataset and our analysis of its contents. All other requested clarifications have been added to areas of the results.

      Beyond specific requests from the reviewers, we made the following changes.

      We felt that a particular terminology choice on our part resulted in some confusion: the use of “SNPs” to refer to genetic variants within our Diversity Outbred samples. While we used SNPs that lay closest to the center of our haplotype predictions as our representative loci for each linkage disequilibrium block, this was done for computational purposes only. We did not focus most of our analyses on the haplotypes themselves, because of the uncertainty of which variants within an LD block actually participated in the genetic-epigenetic interactions we imputed.

      Thus, we edited the text to remove mention of “SNPs” unless our analysis did directly and deliberately profile SNPs themselves. In all other cases, we now refer to “haplotypes”, “genetic variants”, or “variants”. This should help increase clarity in the manuscript as a whole.

      A small error was discovered within the labelling and processing of regression model outputs in chromosome 14. A consistency check was run on all chromosomes, finding that only Chr 14 was affected. Chr 14 was rerun in its entirety to verify its results, with the previous results now archived within our databases uploaded on Synapse (see Methods for a link). All relevant calculations and figures were regenerated, resulting in an average shift of 1% or less across the manuscript. All analyses remain highly statistically significant.

      Responses to comments from Reviewer #1

      Methods

      • Sequencing depth was retrieved from the original publication on the primary multiomics dataset. (Line 105-106)

      • A line was added regarding initial mouse genome alignment for the original publication: we explain the GigaMUGA genotyping array, used for the DO mESC samples. For our ChIP-seq data, we reword to specify: we used liftovers from imputed strain-specific genomes to B6 mm10. (Lines 108-110; 116-120; 168-170)

      • Aneuploidy removal is expanded upon in a similar fashion: the original QC identified chromosome-level gene expression differences to remove aneuploid samples. (Line 111)

      • Mention of the pre-publication use of an alternative null model has been removed, given its lack of relevance to the rest of the text. While it was interesting to compare to the standard null model, it amounts to a side note that distracts from the focus of the paper. (Line 137-139).

      • Descriptive subheadings have been added.

      Results - Line 179 (now Line 191) now points to Methods.

      • Line 189-200 (now Line 188-204): language altered to better explain our intent: We wished to perform an intrachromosomal scan across the whole genome for non-additive genetic-epigenetic interactions. However, there were computational limits to how many possible combinations of gene, haplotype, and ATAC-seq peak we could feasibly test. We thus generated a random subset of possible combinations. This was also performed to identify target regions for focused analyses.

      • Line 195 (now line 206, expanded on in Line 210): Clarification added on the significance of our result: if non-additive genetic-epigenetic interactions were not a significant explanatory factor for gene expression, we would expect to see no enrichment of low p-value results. Instead, we see 0.07% of our models coming in at adj. p < 1x10-7.

      • Line 199 (now Line 216): The requested calculations were run, and are now included in table S3. We found that within 4 Mb of a given gene, less than 10% of variants and ATAC peaks within clustered closer to each other than they did to the gene they affected.

      Please note that this figure has a level of uncertainty due to linkage disequilibrium. Thus, rather than precisely answering the question “[are there haplotype-ATAC pairs] that are in the same locality but further away from the gene?”, we asked "is the ATAC peak closer than the gene to the point where we have the highest confidence of correctly calling the interacting genotype?". The relevant code has been deposited in our Synapse repository (see Methods for link).

      • Line 205 (now restructured in Line 221-228): The text has been edited to specify our intent. We are referring to a set of TAD-focused regression models we generated (see Methods) that comprehensively included all possible interactions between genes, and all haplotypes and ATAC peaks within +/- 1 TAD of the gene.

      • (Line 227): We specified that the previously-published TAD boundary dataset we used was retrieved from the Bing Ren lab’s Hi-C projects, which imputed locations of TAD boundaries in B6 mESCs.

      • We have relabeled Figure 1 and tweaked the surrounding text to clear up some confusing aspects. The Euler plots in Figure 1D-E reflect the fact that each ATAC-seq peak and haplotype can be in multiple relationships with local genes and regulatory factors. Some of these relationships will be simple correlation between their presence and gene expression, while others may co-regulate alongside independent regulatory factors, or engage in non-additive regulatory interactions.

      Because these non-additive regulatory interactions have not been comprehensively studied, we wished to determine whether there were any regulatory factors within our data that would not be detected as significant via more conventional methods, such as correlation analysis, mediation analysis, or regression analysis without an interaction term. Our Euler plots show that there are large subsets of both ATAC-seq peaks and haplotypes that are exclusively found in non-additive interactions. Thus, our justification for focusing on non-additive interactions for the rest of the paper.

      • Line 256 (now Line 252-255): We further clarified the above in this section: correlation and mediation analyses were previously completed by the team which initially analyzed the DO mESC dataset (Skelly et al. 2020, Cell Stem Cell). They performed a correlation analysis between open chromatin and gene expression (Skelly et al. Fig. 2A), and identified expression quantitative trait loci (eQTL) (Skelly et al. Fig. 2E). We felt that more direct comparisons to the Skelly et al. data would distract readers from our focus on genetic-epigenetic interactions. Thus, we limited our discussion of non-interacting regulatory relationships to Figures 1-2, and a brief mention in Figure 5.

      • Line 290 (now Line 337): We pulled promoter locations from the FANTOM5 database of mouse promoters, and included analysis in both the text and Figure S4A-B.

      • (Line 475-476): we clarified “DO founder SNPs” to “SNPs from the non-reference DO founder strains”.

      • Line 472 (restructured in Lines 531-564): We have expanded on this section, including answers to the reviewer’s questions regarding ChIP-seq peak counts, overlap with the TAD map we used for our other analyses, and expanded upon strain-specific CTCF binding we identified in our ChIP-seq analysis.

      Responses to comments from Reviewer #2:

      (1) Typo corrected.

      (2) Lines 194-195 (now line 206, expanded on in Line 210): We have expanded upon the intent and expectations of our analysis. In summary: if non-additive genetic-epigenetic interactions were not a significant explanatory factor for gene expression, we would expect to see no enrichment of low p-value results. Thus, we would expect 0.0000001% of results to reach adj. p < 1x10-7. Instead, we see 0.07% of our models coming in at adj. p < 1x10-7, four orders of magnitude greater than expected.

      (3) Lines 226-230 (Expanded on in Lines 252-276): We have relabeled Figure 1 and tweaked the surrounding text to clear up some confusing aspects. The percentages in the text are derived from the data summarized in the Euler plots in Figure 1D-E. These plots reflect the fact that each ATAC-seq peak and haplotype can be in multiple relationships with local genes and regulatory factors. Some of these relationships will be simple correlation between their presence and gene expression, while others may co-regulate alongside independent regulatory factors, or engage in non-additive regulatory interactions.

      (4) Line 261-263 (now lines 299-300): A companion to Figure 2B has been added (Fig. S3), which provides interaction counts for each ATAC-seq peak that contributed to Figure 2B. A horizontal line is included to highlight the locations of the highly-interacting ATAC peaks.

      (5) Analysis regarding Figure 3B had been removed from its original context. It has now been restored to the manuscript (Line 368-371).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors)

      I only have one comment for improvement of this study and it has to do with the comparison of simulators that they conducted. There are many other simulators around now, including scDesign3, spaSim, SPIDER, SRTSIM, etc. Are any of those methods worth including in the comparison?

      Indeed, many of the mentioned simulators did not exist when we initially developed synthspot, and upon closer examination, they are not directly comparable to our tool.

      • scDesign3: The runtime of scDesign3 is quite long as a result of its generative model. The example provided in its tutorial only simulates 183 genes and takes over seven minutes when using four cores on a system with Intel Xeon E5-2640 CPUs running at 2.5GHz. In a small downsampling analysis, we simulated 10, 50, 100, and 150 genes with scDesign3 and observed runtimes of 30, 130, 245, and 360 seconds, respectively. This seems to indicate a linear relationship between the number of genes and the runtime, therefore rendering it unsuitable for simulating whole-transcriptome datasets for deconvolution.

      • spaSim: spaSim focuses on modelling cell locations in different tissue structures but does not provide gene expression data. It is designed for testing cell colocalization capabilities rather than simulating gene expression.

      • SPIDER: Although SPIDER appears to have some overlap with our work, it seems to be in the early stages of development. The GitHub repository contains only two scripts without any documentation, and the preprint does not provide instructions on how to use the tool.

      • SRTSim: SRTSim explicitly states in its publication that it is not suitable for evaluating cell type deconvolution, as its focus is on simulating gene expression data without modelling cell type composition.

      • scMultiSim: scMultiSim, like scDesign3, is limited in its capability to model the entire transcriptome.

      Nonetheless, the inherent modularity of our Nextflow framework makes it possible for users to simply run the deconvolution methods on data that has been simulated by other simulators if need be.

      Additionally, we have added the following rationale for why we developed synthspot in “Synthspot allows simulation of artificial tissue patterns”:

      “On the other hand, general-purpose simulators are typically more focused on other inference tasks, such as spatial clustering and cell-cell communication, and are unsuitable for deconvolution. For instance, generative models and kinetic models like those of scDesign3 and scMultiSim are computationally intensive and unable to model entire transcriptomes. SRTSim focuses on modeling gene expression trends and does not explicitly model tissue composition, while spaSim only models tissue composition without gene expression.”

      The other aspect of the simulation comparison that I'm missing is some kind of spatial metric. There are metrics about feature correlation, sample-sample correlation, library size, etc. But, what about spatial correlation (e.g., Moran's I or similar). Perhaps comparing the distribution of Moran's I across genes in a simulated and real dataset would be a good first start.

      We would like to clarify that synthspot does not actually simulate the spatial location of spots, but synthetic regions where spots from the same region share similar compositions. Hence, incorporating a spatial metric in the comparison is not feasible. However, as RCTD is the only method that explicitly uses spot locations in its model (Supplementary Table 2, "Location information"), we believe that generating synthetic datasets with actual coordinates would not significantly impact the conclusions of the study.

      Reviewer #2 (Public Review)

      On the other hand, the authors state that in silver standard datasets one half of the scRNA-seq data was used for simulation and the other half was used as a reference for the algorithms, but the method of splitting the data, i.e., at random or proportionally by cell type, was not specified.

      The data was split proportionally by cell type. To clarify this, we have included an additional sentence in the main text under the first paragraph of “Cell2location and RCTD perform well in synthetic data”, as well as in Figure S2.

      Reviewer #2 (Recommendation for the authors)

      Figure legends in Figures 3, 4 and across most Supplementary material are almost illegible. Please consider increasing font size for better readability.

      Thank you for bringing this to our attention. The font size has been increased for all main and supplementary figures. Additionally, the supplementary figures have also been exported in higher resolution.

      Supplementary Notes Figure 2c reads "... total count per sampled multiplied by..."

      This has been adapted, as well as the captions of Supplementary Notes Figure 3c and 4c which had the same typo.

      Review #3 (Public review)

      The simulation setup has a significant weakness in the selection of reference single-cell RNAseq datasets used for generating synthetic spots. It is unclear why a mix of mouse and human scRNA-seq datasets were chosen, as this does not reflect a realistic biological scenario. This could call into question the findings of the "detecting rare cell types remains challenging even for top-performing methods" section of the paper, as the true "rare cell types" would not be as distinct as human skin cells in a mouse brain setting as simulated here.

      We appreciate the reviewer’s concern and would like to clarify that within one simulated dataset, we never mix mouse and human scRNA-seq data together. The synthetic spots generated for the silver standards are always sampled from a single scRNA-seq or snRNA-seq dataset. Specifically, for each of the seven public scRNA-seq datasets, we generate synthetic datasets with one of nine abundance patterns, resulting in a total of 63 synthetic datasets. These abundance patterns only affect the sampling priors that are used—the spots are still created with combinations of cells sampled from the same dataset.

      Furthermore, it is unclear why the authors developed Synthspot when other similar frameworks, such as SRTsim, exist. Have the authors explored other simulation frameworks?

      While there are other simulation frameworks available now, synthspot was designed to specifically address the requirements of our study, offering unique capabilities that make it suitable for deconvolution evaluation. Moreover, many of the simulators did not exist when we initially developed our tool. We have added the following rationale for why we developed synthspot in “Synthspot allows simulation of artificial tissue patterns”:

      “On the other hand, general-purpose simulators are typically more focused on other inference tasks, such as spatial clustering and cell-cell communication, and are unsuitable for deconvolution. For instance, generative models and kinetic models like those of scDesign3 and scMultiSim are computationally intensive and unable to model entire transcriptomes. SRTSim focuses on modeling gene expression trends and does not explicitly model tissue composition, while spaSim only models tissue composition without gene expression.”

      In our response to Reviewer 1 copied below, we also outline specific reasons why other simulators were not suitable for our benchmark:

      • scDesign3: The runtime of scDesign3 is quite long as a result of its generative model. The example provided in its tutorial only simulates 183 genes and takes over seven minutes when using four cores on a system with Intel Xeon E5-2640 CPUs running at 2.5GHz. In a small downsampling analysis, we simulated 10, 50, 100, and 150 genes with scDesign3 and observed runtimes of 30, 130, 245, and 360 seconds, respectively. This seems to indicate a linear relationship between the number of genes and the runtime, therefore rendering it unsuitable for simulating whole-transcriptome datasets for deconvolution.

      • spaSim: spaSim focuses on modelling cell locations in different tissue structures but does not provide gene expression data. It is designed for testing cell colocalization capabilities rather than simulating gene expression.

      • SPIDER: Although SPIDER appears to have some overlap with our work, it seems to be in the early stages of development. The GitHub repository contains only two scripts without any documentation, and the preprint does not provide instructions on how to use the tool.

      • SRTSim: SRTSim explicitly states in its publication that it is not suitable for evaluating cell type deconvolution, as its focus is on simulating gene expression data without modelling cell type composition.

      • scMultiSim: scMultiSim, like scDesign3, is limited in its capability to model the entire transcriptome.

      Finally, we would have appreciated the inclusion of tissue samples with more complex structures, such as those from tumors, where there may be more intricate mixing between cell types and spot types.

      We acknowledge the reviewer's suggestion and have incorporated a melanoma dataset from Karras et al. (2022) in response to this suggestion. This study profiled melanoma tumors by using both scRNA-seq and spatial technologies. The scRNA-seq consists of eight immune cell types and seven melanoma cell states. We have included this study as an additional silver standard and case study, the latter of which is presented in a separate section following the liver analysis (and a corresponding section in Methods).

      We found that method performances on synthetic datasets generated from this melanoma dataset follow previous trends (Figure S3-S5). However, the inclusion of the case study led to the following changes in the overall rankings: cell2location and RCTD are now tied for first place (previously RCTD ranked first), and Seurat and SPOTlight have swapped places. Despite these changes, the core messages and conclusions of our paper remain unchanged. All relevant figures (Figures 1a, 2, 3a, 4a, 6b, 7a, S3-S6, S9) have been updated to incorporate these new analyses and results.

      Review #3 (Recommendation for the authors)

      To maintain consistency in the results, it is recommended to exclude the human scRNAseq set when generating synthetic spots. Furthermore, addressing the other significant weaknesses mentioned earlier would be beneficial.

      Please refer to our response to the public review where we address the same remark.

      It is essential to differentiate this work from previous benchmarking and simulation frameworks.

      In addition to the rationale on why we developed our own framework (see response to the public review), we have included the following text in the discussion that highlights our versatile approach when using a real spatial dataset for evaluation:

      “In the case studies, we demonstrated two approaches for evaluating deconvolution methods in datasets without an absolute ground truth. These approaches include using proportions derived from another sequencing or spatial technology as a proxy, and leveraging spot annotations, e.g., zonation or blood vessel annotations, that typically have already been generated for a separate analysis.”

      Furthermore, we conducted an extra analysis in the liver case study, generating synthetic datasets with one experimental protocol and using the remaining two as separate references (Figure S13). This further illustrates the usefulness of our simulation framework, which we mentioned by appending this sentence in the discussion:

      “As in our silver standards, users can select the abundance pattern most resembling the real tissue to generate the synthetic spatial dataset, as we have also demonstrated in the liver case study.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) Since you only included patients with early-onset preeclampsia in the study, I suggest revising the title to "Identification of novel syncytiotrophoblast membrane extracellular vesicle derived protein biomarkers in early-onset preeclampsia...."

      We have changed our title to early-onset preeclampsia.

      (2) Under methods, you state that placenta was obtained from women undergoing elective cesarean section. Was this because all the study patients were delivered before the onset of labor? Or were laboring patients specifically excluded from the study?

      Indeed, labor influences the extracellular vesicles (EVs) generated. To ensure consistency in our samples and avoid this variable, we chose placentas obtained from elective cesarean sections (CS) for our study.

      (3) In Table 1 on page 10, the 8th row (Birth weight grams) needs to be reformatted. The mean birthweights for normal pregnancy and preeclampsia should be the same.

      We have reformatted the table and using ranges instead of brackets.

      (4) In the legend for Table 1, the sentence beginning on page 10, line 227, and continuing onto page 11, line 228, does not make sense. Part of the sentence was omitted inadvertently.

      We have modified this sentence to :

      Detergent treatment, which could break down EVs, with NP-40 confirmed that the majority (99%) of our samples were largely vesicular since only 0.1 ± 0.12% of BODIPY FL N-(2-aminoethyl)-maleimide and PLAP double-positive events were detected (a reduction of 99%) (Figure 1E and 1H).'

      (5) As you acknowledge, the sample size (12 patients) was small. This is understandable because early-onset preeclampsia occurs in <1% of parturients. You could collaborate with other centers in future studies to increase the sample size.

      Thank you very much for your comment. We are willing to cooperate on future research and will try to expand our sample size in subsequent studies.

      Reviewer #2 (Recommendations For The Authors):

      (1) This is one of the many "catalogue" papers where placental exosome proteins in preeclampsia are profiled. Thus, the manuscript lacks novelty. The only novelty factor is the authors have isolated exosomes by a different method and even separated the small and large exosomes. However, there is no mention of how these exosomes differ from each other in terms of their functionality. Thus it is hard to judge the biological significance of this work.

      We appreciate your insights regarding the novelty of our study. While numerous papers have profiled placental exosome proteins in preeclampsia, our methodology for enriching sSTB-EVs (exosomes) offers a distinct perspective. We believe that the separation of sSTB-EVs (exosomes) and medium/large STB-EVs (microvesicles) introduces a differentiation that extends beyond mere profiling, with implications for their functionality. There are previous studies showed that the different sizes of placenta EVs have distinct characteristics (Zabel RR, et al. Enrichment and characterization of extracellular vesicles from ex vivo one-sided human placenta perfusion. Am J Reprod Immunol. 2021 Aug;86(2)). Furthermore, the way cells internalize and respond to EVs may depend on the size of the EV (Zhuang X et al. Treatment of brain inflammatory diseases by delivering exosome encapsulated anti-inflammatory drugs from the nasal region to the brain. Mol Ther. 2011 Oct;19(10).) Therefore, it would be important for future studies to distinguish different sizes of EVs for the research.

      (2) The authors must demonstrate that these two types of EVs are also produced in vivo by detecting them in the serum of women.

      Thank you for the comment. Many previous studies have shown the two types of placental EVs in women's blood. Nakahara et al.'s (PMCID: PMC7755551) extensive review compiles studies that have specifically isolated various subtypes of placenta-derived EVs from maternal circulation. We have also readdressed it in the introduction.

      (3) The authors must compare the proteomes of serum-derived placental exosomes and the proteome of the STBs isolated from the perfusion experiments to judge how overlapping the outcomes are from those produced naturally and those produced under ex vivo conditions.

      We appreciate the reviewer's suggestion to compare the proteomes of serum-derived placental sSTB-EVs (exosomes) with those from STBs isolated through perfusion experiments. Indeed, such a comparison would provide valuable insights into the similarities and differences between naturally produced and ex vivo-generated sSTB-EVS (exosomes). However, isolating placental EVs from maternal circulation for comprehensive proteomic profiling presents challenges. It requires a significant amount of serum or plasma sample that will be sufficient to enable the isolation of placenta-specific EVs amongst numerous EVs in the circulation. In addition, it will require multiple intricate steps such as ultracentrifugation followed by immunoprecipitation. Each of these steps can potentially lead to the loss of EVs. Additionally, given the high concentration of lipoproteins in plasma relative to EVs, there's a significant risk of obtaining low-purity isolates from the outset. These challenges might compromise the comparability of results between placenta-specific EVs from maternal circulation and those from ex vivo perfusion. Nevertheless, we acknowledge the value of such an endeavor and will consider incorporating this aspect in future studies as the EV and proteomic methodology and technology improve and become more sensitive.

      (4) I have a major issue with the chosen study subjects. While the study title and the manuscript mention preeclampsia, as per the inclusion criteria mentioned in lines 88-90, the patients will be HELLP syndrome. Please clarify what was used and modify the manuscript accordingly.

      Thank you very much for finding this error. Our patients had none of the features that would qualify them for HELLP syndrome. We have edited to:

      PE was defined as new (after 20 weeks) systolic blood pressure of 140 mmHg or diastolic pressure of 90 mmHg, proteinuria (protein/creatinine ratio of 30 mg/mmol or more). None of our patients had maternal acute kidney injury, liver dysfunction, neurological features, hemolysis, or thrombocytopenia.

      (5) It is hard to reconcile how only 15 proteins were identified in the placental extract while 300+ in EVs. There is a methodological issue in the mass spec or extraction. With such widely different denominators in the total proteins identified, it is hard to compare the outcomes in terms of the three sample types.

      We acknowledge the reviewer's concerns regarding the disparity in protein counts between the placental extract and the EVs. Ultimately, more is not necessarily better. Several factors might contribute to this discrepancy. Firstly, it is plausible that certain proteins exhibit selective affinity to varying sizes of EVs, leading to a more diverse range of proteins than the placental extract. We were also stringent in our analysis to enable us to select proteins whose biological differences are more likely to be reproducible with a different validatory method like a western blot. Additionally, although the placental extract might contain a higher total protein concentration, it doesn't necessarily translate to a richer diversity of disease-specific proteins. Considering these nuances when comparing protein outcomes across sample types is helpful.

      (6) I am unable to understand the terms least differentially expressed and most differentially expressed. Do the authors mean upregulated and downregulated? Please clarify and use the terms appropriately by providing fold change values.

      We appreciate the reviewer's request for clarification. We intended to provide a relative measure of expression for the terms 'least differentially expressed' and 'most differentially expressed'. The terms are roughly equitable to down- and upregulated. Regarding EVs, we avoid using the terms 'upregulated' and 'downregulated' as EVs act as transporters and do not possess regulatory functions per se. However, for the placenta, we recognize the relevance of these terms.

      (7) The data presented is very superficial and lacks methodological details. The authors should provide the total number of targets achieved after mass spec. The cutoffs used the FDRs and other details.

      We apologize for the omission. We have added these details to the method section.

      (8) It is not clear how were these differentially abundant proteins identified. What was the cutoff used? Was it identified in all the replicates?

      We apologize for the omission. We have added these details to the method section.

      (9) How many samples were subjected to the discovery cohort, and how many were in the validation cohort? Were they the same or different? If the samples were different, how many PE samples had differentially abundant proteins by both methods?

      The study utilized 12 samples for initial discovery and another 12 for western blot validation. The validation samples specifically targeted proteins of interest, rather than undergoing another comprehensive mass spectrometry analysis.

      (10) It is striking that the authors report the expression of prostatic acid phosphatase in the placenta. In my understanding of placental biology, this gene or protein is not known to be expressed by the placenta. Please perform immunofluorescence to demonstrate that this protein is indeed produced in the STBs

      Research has revealed that even though it's called prostate-specific antigen, it's created in tissues other than the prostate, such as the placenta. Here are a couple of references to support this claim: PMID: 10634405, PMID: 7533063, PMID: 8939403, and PMID: 8945610. Hence it is likely not beneficial to demonstrate what many researchers have already demonstrated.

      (11) Please validate the differential abundance of these proteins in the exosomes isolated from the plasma of women with and without preeclampsia. A serial measurement will be of high value to determine how early as compared to hypertension, these biomarkers can predict preeclampsia.

      We are validating each EV-carried marker individually in the circulation (plasma or serum), localizing them in the placenta, and performing downstream functional analysis. This article is already lengthy and would likely be too cumbersome to include the details of all individual proteins in this manuscript. However, we have already published papers on Siglec 6 (PMID: 32998819) and Neprilysin (PMID: 30929513), and others will be published soon. We agree that there will be a lot of value to serial measurement, not just in terms of how early as compared to hypertension, these biomarkers can predict preeclampsia but also as potentially a more sensitive or specific test. This would be the subject of subsequent papers.

      (12) The authors are recommended to carry out immunofluorescence to localize the differentially abundant proteins in the placental sections and show that they are specific to STBs.

      We have already provided a similar response earlier (see response to point 11). In addition, while it is preferable, the biomarkers don't necessarily need to be specific to STB. Not all biomarkers are mechanistic agents/targets, and not all mechanistic agents are biomarkers. However, mechanistic agents should preferably be placental-specific. For example, the total sFLT1, the most studied biomarker, is not exclusively synthesized in the placenta, even though the placental-specific isoform represents a small fraction of the total sFLT-1. For example, in the non-placental world, alkaline phosphatase (ALP) is not exclusively produced by the liver but is a ‘biomarker’ of cholestatic disease.

      (13) Table 1 should give the range and SD could be given as + instead of the bracket.

      Thank you for your suggestion. We have edited it accordingly.

      (14) It is necessary to provide the gestational age of the onset of hypertension to get a judgment of how long these women were preeclamptic, culminating in HELLP.

      We want to emphasize that none of our patients experienced HELLP syndrome. In the results section, we have included the gestational age at the time of diagnosis in the table for preeclampsia. It's crucial to understand that the gestational age at diagnosis is distinct from the gestational age when hypertension initially appeared. Detecting the exact gestational age of hypertension onset would be challenging, and it would likely require a prospective or randomized clinical trial with continuous monitoring, possibly on a daily basis. However, our study is retrospective. Thus we can only comment on the gestational age at diagnosis

      (15) For newborns the term Sex is used and not gender

      Thank you for your suggestion. We have edited it accordingly.

      (16) Figure 2 is stretched and hard to read

      Thank you for your suggestion. We have edited it accordingly by creating two separate images to promote readability.

      (17) Line 278 change the sentence "there fifteen (15) proteins in the placenta" to "there were fifteen (15) proteins in the placenta"

      Thank you for your suggestion. We have edited it accordingly.

      (18) Line 288 you mean least and not lease

      Thank you for your suggestion. We have edited it accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our knowledge of how parasites evade the host complement immune system. The new cryo-EM structure of the trypanosome receptor ISG65 bound to complement component C3b is highly compelling and well-supported by biochemical experiments. This work will be of broad interest to parasitologists, immunologist, and structural biologists.

      We thank the reviewers and editorial team for this assessment of our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to use structural biology (cryo-EM), surface plasmon resonance, and complement convertase assays to understand the mechanism(s) by which ISG65 dampens the cytoxicity/cellular clearance to/of trypanosomes opsonised with C3b by the innate immune system.

      The cryo-EM structure adds significantly to the author's previous crystallographic data because the latter was limited to the C3d sub-domain of C3b. Further, the in vitro convertase assay adds an additional functional dimension to this study.

      The authors have achieved their aims and the results support their conclusions.

      The role of complement in immunity to T. brucei (or lack thereof) has been a significant question in molecular parasitology for over 30 years. The identification of ISG65 as the C3 receptor and now this study providing mechanistic insights represents a major advance in the field.

      Reviewer #2 (Public Review):

      This is an excellent paper that uses structural work to determine the precise role of one of the few invariant proteins on the surface of the African trypanosome. This protein, ISG65, was recently determined to be a complement receptor and specifically a receptor of C3, whose binding to ISG65 led to resistance to complement-mediated lysis. But the molecular mechanism that underlies resistance was unknown.

      Here, through cryoEM studies, the authors reveal the interaction interface (two actually) between ISG65 and C3, and based on this, make inferences regarding downstream events in the complement cascade. Specifically, they suggest that ISG65 preferably binds the converted C3b (rather than the soluble C3). Moreover, while conversion to a C3bB complex is not blocked, the ability to bind complement receptors 1 and 3 is likely blocked.

      Of course, all this is work on proteins in isolation and the remaining question is - can this in fact happen on the membrane? The VSG-coated membrane is supposed to be incredibly dense (packed at the limits of physical density) and so it is unclear whether the interactions that are implied by the structural work can actually happen on the membrane of a live trypanosome. This is not necessarily a dig but it should be addressed in the manuscript perhaps as a caveat.

      We thank the reviewer for their positive response our work. We fully agree with the reviewer about the caveats which come from this work being done in a biochemical context. We have addressed this in lines 223-24 and 327-333.

      Reviewer #3 (Public Review):

      The authors investigate the mechanisms by which ISG65 and C3 recognize and interact with each other. The major strength is the identification of eco-site by determining the cryoEM structure of the complex, which suggests new intervention strategies. This is a solid body of work that has an important impact on parasitology, immunology, and structural biology.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A paper by Sulzen et al was published online on 27th April in Nature Communications that has a similarity (the cryo-EM structure) to this paper. This does not detract from the value of this paper. The authors should, however, include a "compare and contrast" section in this paper to explain similarities and differences in the conclusions. For example, while this paper demonstrates that ISG65 does not prevent C3 convertase activity, the Sulzen paper suggests it does prevent C5 convertase activity. The compatibility of these conclusions should be discussed.

      Two studies of ISG65 were published shortly after submission of this manuscript (Sulzen et al and Lorenzen et al) and we have added a brief comparison of the conclusions of these papers here. These mentions include lines 151, 155-6, 201-2, 274-278, 292-93 and 321-323. For a more in-depth comparison we have published an opinion piece in Trends in Parasitology, which discusses all three of these papers and which we also now reference here.

      Could the authors comment as to whether they think the association of C3b with the unstructured region of ISG65 comes about via S-S shuffling? I.e., is C3B first thioester linked to VSG and then this rearranges to ISG65 through C3b-ISG65 proximity?

      We thank the reviewer for the interesting suggestion. However, we are not aware of evidence showing that C3b, which has been conjugated to a target protein through its covalent ester bond, then becomes transferred to a second target protein. As ISG65 can bind to C3 as well as C3b, we think that the conjugate could form when ISG65-bound C3 converts to C3b, becomes reactive and, through proximity, is most likely to conjugate to ISG65. Whether this occurs to a substantial degree in trypanosomes, or whether it is more likely that ISG65 interacts with C3b which is already VSG-conjugated, requires further experiments. We have edited lines 217-222 to make this point more clearly.

      Reviewer #3 (Recommendations For The Authors):

      The authors previously reported that ISG65 C-terminus is so flexible and is not resolved in their 2022 ISG65-C3d (TED of C3b) crystal structure, which is the same case here in the cryo-EM structure of ISG65-C3b. Thus, I am wondering how C3b might find the flexible C-terminus and form a covalent bond.

      We think that the answer to the reviewer’s question relates to local concentration. When two reactive compounds are not attached together, then they diffuse freely in three-dimensions and their likelihood of colliding and reacting is subject to the randomness of Brownian motion. However, if they bind together through an interaction distinct from the reactive residues, then this increases their relative local concentration and the likelihood of collision and reaction taking place. In the case of ISG65, this is coupled with the ability of ISG65 to bind to C3 before it converts to C3b and becomes reactive. The interaction of ISG65 with C3/C3b will therefore bring together the reactive residues and increases the probability that they will collide and form a conjugate. Our control with BSA, which does not bind to C3/C3b, and does not form these conjugates supports this conclusion. We have edited lines 217-222 to clarify.

      I also find it puzzling that deleting L2 or L3 in ISG65, which they found forming additional contracts with CUB domain of C3b (12 times binding tighter), does not affect the ISG65-C3b conjugate formation in the in vitro C3 convertase formation assay.

      When we consider the affinities that the L2 and L3 loop deletions variants have for ISG65, and the concentration of ISG65 in the C3 convertase assay, we would predict that the conjugates still form with the L2 and L3 variants. This binding would therefore increase the relative local concentration of the reactive residues and ensure preferential conjugate formation, as we observe.

      (1) Page 2 bottom line, "In particular, loop 2 forms a direct contact with the CUB domain of ISG65, centered around an electrostatic", ISG65 should be C3b.

      We thank the reviewer for spotting this. It has been corrected.

      (2) Page 4, "We found that ISG65 does not complete with either factor B or Factor D and does not block the binding of factor Bb (Figure 3b). This suggests that the C3 convertase can form in the presence of ISG65", "complete" should be "compete".

      It has been corrected.

      (3) Page 4, "revealed that in the presence of ISG65 a high molecular weight band appeared, which we identified through mass spectrometry to be a conjugate of ISG65 with C3b". There is no mass spectrometry data in the manuscript to support this.

      We agree with the reviewer that this data should be included in the paper and have now added it as Supplementary Table 3.

      (4) Page 5, "By inhibiting binding of CR2 to C3d, ISG65 will reduce the likelihood that B-cell receptor binding to trypanosome antigens will result in B-cell activation and antibody production." - this sentence is a bit confusing.

      We have clarified this point in lines 243-245.

      (5) Related to Figure 2a. "This structure reveals the two distinct interfaces formed between ISG65 and C3b (Figure 2a)." It would be clearer to label where interface 1 and interface 2 are in Figure 2a.

      We have now labelled interfaces 1 and 2 above the insets in Figure 2a.

      (6) Related to Figure 2C. I suggest mutagenesis to validate ISG65 L2/L3 - C3b CUB domain interaction, i.e. mutate ISG65 (N188, R187, Y190) and perform SPR with C3b.

      We agree with the reviewer that this experiment was a valuable validation of our structural data. To achieve this aim, we changed our SPR assay, coupling C3 variants to the chip surface in an orientation which would match their conjugation to a pathogen and allowing us to reliably compare the affinities of ISG65 variants. We then assessed the binding of ISG65, ISG65∆L2, and the ISG65L2N188A,H189A,Y190A proposed by the reviewer. As predicted from the structure, both loop 2 deletion and mutation reduced the affinity for C3b but did not affect the affinity for C3d, suggesting that the difference in affinity of ISG65 for C3b and C3d is due to the observed interface 2. This new data is described in lines 150-168 and is presented in Figure 2c.

      (7) Related to Figure 3a. Is the C3b only structure in the presence of ISG65 the real C3b only? Discussion can be added.

      Our cryoEM analysis of the ISG65-C3b mixture yielded three dimensional classes which contained clear density for ISG65 and those in which there was no density for ISG65. While the reviewer is technically correct, and we cannot be 100% sure that there is not an entirely disordered ISG65 attached to these ‘unbound’ C3b, we think that this is extremely unlikely. In either case, these ‘unbound’ C3b are indistinguishable from other structures of C3b and the argument in the paper stands. We have added a clause in lines 178-179 to make this point.

      (8) Related to Figure 3e. There is no label for WT and deletion mutants. Also, L1 and L3 deletion does not seem to show on the gel.

      We have added these labels.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Reviewing Editor and two additional reviewers for the insightful input they gave us on the first version of our manuscript on allosteric activity regulation of the anaerobic ribonucleotide reductase from Prevotella copri. We have revised the manuscript in the light of the reviewers' comments. In particular, we have added additional experiments using hydrogen-deuterium exchange mass spectrometry (HDX-MS) to probe the accessibility and mobility of different parts of the protein structure in the apo-state and in the presence of dATP/CTP and ATP/CTP. The results strongly confirm the binding of nucleotides to the activity and specificity sites, as seen biochemically and structurally. In the question of mobility of the glycyl radical domain the HDX-MS experiments suggest an increased mobility in the presence of dATP, though the results are not as clear-cut as for the nucleotide binding. The HDX-MS analyses are complicated by the fact that they reflect all species in solution, which are evidently multiple for all states of PcNrdD. Finally, we have rephrased key parts of the results and discussion, and modified the title, to avoid any implication that we believe the glycyl radical domain becomes extensively disordered, rather that it becomes more mobile to the extent that it cannot be seen in the cryo-EM structures.

      eLife assessment

      This study advances our understanding of the allosteric regulation of anaerobic ribonucleotide reductases (RNRs) by nucleotides, providing valuable new structural insight into class III RNRs containing ATP cones. The cryo-EM structural characterization of the system is solid, but other aspects of the manuscript, which are incomplete, could be improved by including additional functional characterization and more evidence for the proposed mechanism of inhibition by dATP. The work will be of interest to biochemists and structural biologists working on ribonucleotide reductases and other allosterically regulated enzymes.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent substrate binding and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript could be improved, however, by performing additional experiments to establish that the mechanism of inhibition can be observed in other contexts and it is not an artifact of the structural approach. Additionally, some of the presentations of biochemical data could be improved to comply with standard best practices.

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We thank the editor and reviewers for their positive evaluation of the potential impact of our work. We completely agree that hypotheses based on structural data require orthogonal experimental verification. However, the number and consistency of the cryo-EM structures speak in favour of the data being representative of conditions in solution. We feel that in particular cryo-EM data should be relatively free of artefacts, e.g. biased or incorrect relative domain orientations, compared to crystallography, where crystal packing effects can affect these parameters. As we write in response to Reviewer #2, it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and only partly ordered in the dATP-bound tetramers. Further verification experiments will be performed in future but are outside the scope of the present article.

      We will improve the presentation of the biochemical data in a revised version.

      General comments:

      (1) It would be ideal to perform an additional experiment of some type to confirm the orderdisorder phenomena observed in the cryo-EM structures to rule out the possibility that it is an artifact of the structure determination approach. Circular dichroism might be a possibility?

      Circular dichroism reports only on the approximate relative proportions of helix, sheet and loop structure in a protein, thus we believe that it would not be a sensitive enough tool to distinguish between ordered and disordered states. We are considering what alternative methods might be appropriate.

      (2) Does the disordering phenomenon of one subunit in the ATP-bound structures have any significance - could it be related to half-of-sites activity? Does this RNR exhibit half-of-sites activity?

      Half-of-sites activity has not been biochemically proven in any ribonucleotide reductase in spite of the fact that it was first suggested in 1987 (PMID: 3298261). However, strong structural indication was recently published in the form of the holo-complex of the class Ia ribonucleotide reductase from Escherichia coli, which is highly asymmetrical and in which productive contacts forming an intact proton-coupled electron transfer pathway are only formed between one of two pairs of monomers (PMID: 32217749). We have not been able to prove half-of-sites activity for PcNrdD due to low overall radical content, but the structural results are indeed consistent with such an activity.

      (3) Does the disordering of the GRD with dATP bound have any long-term impact on the stability of the Gly radical? I realize that the authors tested the ability to form the Gly radical in the presence of dATP in Fig. 4 of the manuscript. But it looks like they only analyzed the samples after 20 min of incubation. Were longer time points analyzed?

      Radical content was measured after 5 min and 20 min incubation; 5 min incubations (not included in the manuscript) consistently gave higher radical content compared to 20 min incubation. Longer time points were not analysed, as we assumed that the radical content would be even lower after 20 min.

      (4) Did the authors establish whether the effect of dATP inhibition on substrate binding is reversible? If dATP is removed, can substrates rebind?

      This is an interesting question. We measured KDs for dATP in the micromolar range and are hence confident that dATP binding is reversible. Our measurements do not, however, directly prove that inhibition of the enzyme is reversible. Nevertheless, it is worth noting that the protein as purified was precipitated and analysed by the UV-visible spectrum. The aspurified PcNrdD contained 30% nucleotide contamination. The as-purified sample was then analysed by HPLC and we identified a major peak, corresponding to dATP/dADP. Therefore, purification conditions had to be optimised to remove the nucleotides. This is evidence that PcNrdD that has “seen” dATP can subsequently bind substrates in the presence of ATP. We will describe the purification more clearly in a revision.

      (5) In some figures (Fig. 6e, for example), the cryo-EM density map for the nucleotide component of the model is not continuous over the entire molecule. Can the authors comment on the significance of this phenomenon? Were the ligands validated in any way to ensure that the assignments were made correctly?

      Indeed we sometimes saw discontinuous density for the nucleotides, both in the active site and in the specificity site. However, the break was almost always near the C5’ carbon atom, which is common to all nucleotides. While we cannot readily explain this phenomenon, the nucleotides refined well with full occupancy, giving B-factors similar to those of the surrounding protein atoms. The identity of the nucleotide could always be inferred from a) the size of the base (purine or pyrimidine); b) the known nucleotide combinations added to the protein before grid preparation; c) prior knowledge on the combinations of effector and substrate that have been found valid for all RNRs since the first studies of allosteric specificity regulation.

      Reviewer #2 (Public Review):

      This manuscript describes the functional and structural characterization of an anaerobic (Class III) ribonucleotide reductase (RNR) with an ATP cone domain from Prevotella copri (PcNrdD). Most significantly, the cryo-EM structural characterization revealed the presence of a flap domain that connects the ATP cone domain and the active site and provides structural insights about how nucleotides and deoxynucleotides bind to this enzyme. The authors also demonstrated the catalytic functions and the oligomeric states. However, many of the biochemical characterizations are incomplete, and it is difficult to make mechanistic conclusions from the reported structures. The reported nucleotide-binding constants may not be accurate because of the design of the assays, which complicates the interpretation of the effects of ATP and dATP on PcNrdD oligomeric states. Importantly, statistical information was missing in most of the biochemical data. Also, while the authors concluded that the dATP binding makes the GRD flexible based on the absence of cryo-EM density for GRD in the dATP-bound PcNrdD, no other supports were provided. There was also a concern about the relevance of the proposed GRD flexibility and the stability of Gly radical. Overall, the manuscript provides structural insights about Class III RNR with ATP cone domain and how it binds ATP and dATP allosteric effectors. However, ambiguity remains about the molecular mechanism by which the dATP binding to the ATP cone domain inhibits the Class III RNR activity.

      Strengths:

      (1) The manuscript reports the first near-atomic resolution of the structures of Class III RNR with ATP domain in complex with ATP and dATP. These structures revealed the NxN flap domain proposed to form an interaction network between the substrate, the linker to the ATP cone domain, the GRD, and loop 2 important for substrate specificity. The structures also provided insights into how ATP and dATP bind to the ATP cone domain of Class III RNR. Also, the structures suggested that the ATP cone domain is directly involved in the tetramer formation by forming an interaction with the core domain in the presence of dATP. These observations serve as an important basis for future study on the mechanism of Allosteric regulation of Class III RNR.

      (2) The authors used a wide range of methodologies including activity assays, nucleotide binding assays, oligomeric state determination, and cryo-EM structural characterization, which were impressive and necessary to understand the complex allosteric regulation of RNR.

      (3) The activity assays demonstrated the catalytic function of PcNrdD and its ability to be activated by ATP and low-concentration dATP and inhibited by high-concentration dATP.

      (4) ITC and MST were used to show the ability of PcNrdD to bind NTP and dATP.

      (5) GEMMA was used successfully to determine the oligomeric state of PcNrdD, which suggested that PcNrdD exists in dimeric and tetrameric forms, whose ratio is affected by ATP and/or dATP.

      Weaknesses:

      (1) Activity assays.

      The activity assays were performed under conditions that may not represent the nucleotide reduction activity. The authors initiated the Gly radical formation and nucleotide reduction simultaneously. The authors also showed that the amount of Gly radical formation was different in the presence of ATP vs dATP. Therefore, it is possible that the observed Vmax is affected by the amount of Gly radical. In fact, some of the data fit poorly into the kinetic model. Also, the number of biological and technical replicates was not described, and no statistical information was provided for the curve fitting.

      The highest turnover activity of PcNrdD measured in presence of ATP was 1.3 s-1 (470 nmol/min/mg), a kcat comparable to recently reported values for anaerobic and aerobic RNRs from Neisseria bacilliformis, Leeuwenhoekiella blandensis, Facklamia ignava, Thermus virus P74-23, and Aquifex aeolicus (PMID: 25157154, PMID: 29388911, PMID: 30166338, PMID: 34314684, PMID: 34941255). The general trend illustrated in Figure 1 is that ATP has an activating effect on enzyme activity, whereas high concentrations of dATP have an inactivating effect on activity, which cannot be explained by suboptimal assay conditions since our EPR results consistently show that more radical is formed in incubations with dATP compared to incubations with ATP. Curve fitting methods used are listed in Materials and Methods (as specified in the Figure 1 legend), and standard errors for all specified curve fitting results (from triplicate experiments) are shown in Figure 1.

      (2) Binding assays.

      The interpretation of the binding assays is complicated by the fact that dATP binds both a- and s-sites and ATP binds a- and active sites. dATP may also bind the active site as the product. It is unknown if ATP binds s-site in PcNrdD. Despite this complexity, the binding assays were performed under the condition that all the binding sites were available.

      Therefore, it is not clear which event these assays are reporting.

      Both ITC and MST experiments involving ATP and dATP binding to the a-site were performed in the presence of at least 1 mM GTP substrate (5 mM in MST) to fill the active site, and 1 mM dTTP effector to fill the s-site (specified in the legend to Figure 2). These conditions enable binding of ATP or dATP only to the a-site in the ATP-cone.

      (3) Oligomeric states.

      Due to the ambiguity in the kinetic parameters and the binding constants determined above, the effects of ATP and dATP on the oligomeric states are difficult to interpret. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding. For the effects of dATP, the authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP alone vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. Therefore, these observations show no correlation between the oligomeric state and the catalytic activity.

      The results in Figure 3 show that at in presence of 100 µM ATP plus 100 µM CTP the oligomeric equilibrium is 64% dimers plus 36% tetramers, and in presence of 50-100 µM dATP the oligomeric equilibrium is 32% dimers and 68% tetramers. We agree that there is no clear and strong correlation between oligomeric state and inhibition. We will also try to make it clearer in a revised version. Meanwhile, in order to add some clarity to our observations, SEC experiments at higher nucleotide concentrations will be done to strengthen our observations.

      (4) Effects of dATP binding on GRD structure

      One of the key conclusions of this manuscript is that dATP binding induces the dissociation of GRD from the active site. However, the structures did not provide an explanation for how the dATP binding affects the conformation of GRD or whether the dissociation of GRD is a direct consequence of dATP binding or it is due to the absence of nucleotide substrate. Also, Gly radical is unlikely to be stable when it is not protected from the bulk solvent. Therefore, it is unlikely that the GRD dissociates from the active site unless the inhibition by dATP is irreversible. Further evidence is needed to support the proposed mechanism of inhibition by dATP.

      We admit that it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can only be partly modelled in the dATP-bound tetramers. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes disorder of the GRD, given that all are part of a connected system (described as “nexus” in the manuscript). The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap.

      In any case a major conclusion of the work is that dATP does not inhibit the anaerobic RNR by prevention of glycyl radical formation but by prevention of its subsequent transfer. We agree that further evidence is required to support the proposed mechanism, but given the extent of the data already presented in the manuscript, we feel that such studies should be the subject of a future publication.

      (5) Functional support for the observed structures.

      Evidence for connecting structural observations and mechanistic conclusions is largely missing. For example, the authors proposed that the interactions between the ATP cone domain and the core domain are responsible for tetramer formation. However, no biochemical evidence was provided to support this proposal. Similarly, the functional significance of the interaction through the NxN flap domain was not proved by mutagenesis experiments.

      We did actually make mutants to verify the observed interactions, but several of them did not behave well in our hands, e.g. with regard to protein stability. Since we have no evidence that oligomerisation is coupled to inhibition, and since we did not observe any conservation between protein sequences in the interaction area, we chose not to pursue this point further. The main merit of the tetramer structures is that they allowed a high-resolution view of dATP binding to the ATP-cone and a comparison to previously-observed ATP-cones. Nevertheless, mutation experiments, also including the NxN flap, could be the subject of future work.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination. One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      Given the resolution of some of the structures in the remote regions that appear to be of importance, the rigor of the work could have been improved by complementing this experimental studies with molecular dynamics (MD) simulations to reveal the dynamics of the GRD and loops/flaps at the active site.

      We have discussed with expert colleagues the possibility of carrying out MD simulations on the different states in order to study the differential effects of ATP and dATP binding on the dynamics of the GRD. However, they felt that the chance of obtaining meaningful results was low, particularly since some structural elements are missing from the models for both forms, in particular the linker between the ATP-cone and the core.

      The biochemical data supporting the loss of substrate binding with dATP association is compelling, but the binding studies of the (d)ATP regulatory molecules are not; the authors noted less-than-unity binding stoichiometries for the effectors.

      Most of the methods used measure only binding strength, not the number of binding sites (N), whereas ITC also measures number of sites. N is dependent on the integrity of the protein, i.e. the number of protein molecules in a preparation that are involved in binding, and quite often gives lower values than the theoretical number of binding sites.

      Also, the work would benefit from additional support for oligomerization changes using an additional biochemical/biophysical approach.

      SEC (chromatography), GEMMA (mass spectrometry) and cryo-EM were used to study oligomerization. Since each method has restrictions on nucleotide concentrations as well as protein concentrations that can be used, the results are not directly comparable, but all three methods indicate nucleotide dependent oligomerization changes. The SEC results will be included in a revised version.

      Overall, the authors have mostly achieved their overall aims of the manuscript. With focused modifications, including additional control experiments, the manuscript should be a welcomed addition to the RNR field

      Recommendations for the authors: Reviewer #1 (Recommendations For The Authors):

      (1) The last sentence of the abstract is not complete. The structures implicate a complex network of interactions in ... ? What do they implicate?

      A couple of words seem to have been missed from the abstract. We have rewritten the end of the abstract to emphasise better that the dynamical transitions involve a linked network of interactions and not just the GRD.

      (2) A reference is needed in the second sentence of the introduction.

      We have added a reference as requested.

      (3) Page 2, paragraph 2. The authors state "two beta subunits (NrdB) harboring a stable radical." This is not accurate. First of all, each beta subunit harbors its own cysteine oxidant.

      And in several subclasses, that oxidant is not a stable radical but an oxidized metal cluster. Please revise to improve accuracy and also provide appropriate references.

      We have revised the description and added a recent reference.

      (4) Page 4, Fig. 1, panels C and D. The fit of the curve to the data is pretty poor. Is there an explanation? Could the data be improved in some way? In general, it is also best practice nowadays to show the individual data points in addition to the error bars in plots like the ones shown in Figure 1. Please modify the plots to include the individual data points in this figure - and probably also the subsequent figures showing binding data.

      We have modified relevant panels in Figures 1, 2 and 5 as requested.

      (5) Page 12, first paragraph. The authors state that one of the monomers in the ATP-CTP structure is well ordered and the other is less ordered. It would be ideal to show in a figure the basis for this conclusion using the cryo-EM maps. The "less ordered" monomer appears to be fully modeled.

      Since the 2-fold axis of the dimer is vertical, the GRD of the left-hand monomer is hidden from view at the back of the molecule in Figure 6. For this monomer there was a small amount of density that allowed modelling of part of the glycyl radical loop (though not the tip containing the radical Gly itself) and the NxN flap, albeit with significantly higher mobility. We have illustrated this through an additional supplement for Figure 6 (figure supplement 2) in which the B-factors of the residues are shown both as a ribbon with radius proportional to the B-factor and through colouring. We hope that the four views in Figure 6 (figure supplement 2) together illustrate the relative mobility of different parts of the dimer.

      It would also be ideal to show the basis for the conclusion that the entire GRD is disordered in the dATP-bound dimer structure.

      Thank you for this suggestion. We have added a fifth supplement to Figure 8 in which we show the cryo-EM reconstruction for the dATP-bound dimer in two orientations, with the ATP-CTP-bound structure superimposed, which clearly shows that the entire GRD, the ATPcones, linker and NxN flap are all disordered in both monomers.

      Reviewer #2 (Recommendations For The Authors):

      (1) Units to describe enzyme activity.

      • The unit for the specific activity in the main text (nmol/min•mg) is unusual. It is most likely a typo of nmol/min/mg or nmol/(min•mg).

      We have changes to nmol/min/mg in the text.

      • The unit for the Vmax is unusual and should not be confused with the specific activity. By definition, Vmax is the velocity of a reaction at a defined enzyme concentration/amount. For example, if an assay of 10 mg enzyme yielded 470 nmol of product in 1 min, Vmax is 470 nmol/min, whereas the specific activity is 47 nmol/min/mg.

      The velocity as calculated above is ca 1.3 s-1. We have added kcat values to accompany the specific activities given.

      (2) Steady-state kinetic analysis.

      • The steady-state kinetic analysis in Figure 1 needs to be repeated. While the nonlinear curve fitting for Figure 1a is reasonable, those in Figures 1b, 1c, and 1d were outside the error range. Consequently, the reported kinetic parameters are unlikely accurate. The authors should repeat the assays with different enzyme preparation to account for all the errors. If the fit curve is still outside the error range, the kinetic model is likely incorrect, and the authors need to investigate different kinetic models.

      The replotted Figure 1 now includes two different experiments for 1b (four replicates in total).

      • The authors should report the number of replicates and the statistical data for the curve fitting.

      The figure legend has been updated with statistical data for all curve fits, and the number of replicates has been added.

      • The authors should report Vmax, Ki, and KL for Figure 1d.

      Results in Figures 1c and 1d are less straightforward than those in Figures 1a and 1b where the s-site is filled with dTTP, favouring binding of GTP to the active site. The curve fit in Figure 1c is disturbed at high concentrations of ATP, which plausibly competes with the CTP substrate and results in inhibition by formed dATP. The curve fit in Figure 1d is less certain since reduction of substrate is low due to intrinsic CTP reduction in absence of effector and partially overlapping activation and inhibition effects of dATP.

      • The authors should consider presenting the data in a log scale because of the complex nature of the activation/inhibition at the lower concentrations of dATP.

      Log scale plots are included as insets in Figures 1b and 1d.

      • The basal level of CPT reduction in the absence of an effector nucleotide should be reported with an error.

      The error value has been added in the figure legend for the basal level of CTP reduction in the absence of effector.

      (3) Equations for the kinetic analysis.

      -The equations should be numbered and referred to in the Figure 1 legend.

      All equations are specified and numbered in Materials and Methods. The equation used for each curve fit in the panels in Figure 1 is specified in the figure legend.

      -KL must be defined in the main text. I suppose this is Kd for ATP or dATP. The equation for KL determination is missing brackets for dNTP.

      KL (the concentration of an allosteric effector that gives half maximal enzyme activity) is defined in Materials and Methods where the equation is described. KL is not the same as KD (the dissociation constant for a ligand and its receptor). Brackets have been added to equation 1.

      • I believe dNTP in the first equation is incorrect because ATP was the ligand for Figures 1A and 1C.

      [dNTP] in the first equation has been changed to [NTP/dNTP] to indicate that both ribonucleotides and deoxyribonucleotides can bind.

      • The second equation can be expressed as dATP as I believe this is the only ligand that inhibits the enzyme.

      We prefer to keep the more general [dNTP] in the equation.

      • The equation used for the fitting in Figure 1d must be defined more clearly than "a combination of the two equations".

      The equation used for the curve fit in Figure 1d has been specified as equation 3 in Materials and Methods.

      (4) Design of the activity assays

      It is not clear if the activity assays report the rate of glycyl radical formation or nucleotide reduction. The authors mixed NrdD and NrdG and initiated the reaction by adding formate (essential for nucleotide reduction) and dithionite (Gly radical formation). The Gly radical formation is slow (in min time scale). The authors reported that ATP/dATP affected the rate of Gly radical formation and in the presence of ATP, Gly radical formation was incomplete even after 20 min. Therefore, it is possible that within the timescale of the activity assays (5 min), the reactions could be partially limited by the Gly radical formation, which may be the reason for the poor curve fitting.

      Activity assays were performed with 5 min pre-incubation without dithionite and formate (no glycyl radical formation) and 10 min incubation after addition of dithionite and formate (glycyl radical formation plus substrate reduction). During earlier tests, NrdD and NrdG were first preincubated in the presence of dithionite (glycyl radical formation) and after addition of formate the substrate reduction was monitored during 20 min. These experiments resulted in lower enzyme activity, whereas higher activity was achieved only upon formate addition to the preincubation reaction. We suppose that the presence of dithionite, which is a strong reducing agent, affected NrdD stability and the reaction was stabilised by the presence of formate at an earlier stage of the reaction. For the EPR conditions used in the paper, 5 min incubation gave higher radical content compared to 20 min, and the reported activity assay gave highest activity after 10 min incubation; kcat of 1.3 s-1.

      (5) Methods section for the activity assays.

      • The concentration of dTTP, ATP, and dATP used in the assays must be described.

      We thank the reviewer for pointing out this omission and we have now specified the concentrations used.

      • Although the authors mentioned that they changed the concentration of dTTP, such data were not presented. Is this correct? Did the authors fix the dTTP concentration for the GTP reduction?

      We apologise for the ambiguity and have specified that the dTTP concentration was fixed at 1 mM in the GTP experiments and that only the ATP or dATP concentrations were varied.

      (6) Discrepancy between Ki/KL and Kd.

      • There is a significant ambiguity remaining about the binding event that the ITC and MST results are reporting. Although dATP binds to both a- and s-sites and ATP binds to both active site and a-site, only a single binding event was observed in both cases. To distinguish the dATP binding to a- and s-sites and the active site, the authors should perform binding assays using mutant enzymes with only one of the binding sites available for dATP/ATP binding.

      MST and ITC were performed in presence of substrate (1 mM GTP) and s-site effector (1 mM dTTP in ITC experiments, and 5 mM dTTP in MST experiments), thus dATP is blocked from binding to the s-site and ATP from binding to the active site.

      • There are significant differences between Kd determined by MST or ITC and Ki/KL determined by the activity assays. Kd measurements were performed in the absence of the substrate nucleotides, while the assays required substrates. There may be complications from the presence of NrdG and the Gly radical formation. The authors must clearly describe all these complications and the discrepancy between Kd and Ki/KL.

      MST, ITC and enzyme assays were all performed in the presence of substrate, and enzyme assays also contained NrdG, which was not present in the MST and ITC analyses. While KD is a thermodynamic constant representing the affinity of ligand to its binding site - in our case an effector nucleotide to the ATP-cone, KL is a kinetic constant (the allosteric effector concentration that gives half maximal activity) representing the relationship between the effector concentration and the reaction speed and is affected by the enzyme turnover number (kcat). The relationship between KD, KL and Ki is further complicated by conformational and possibly oligomeric state changes of NrdD upon binding of allosteric effectors, which occurs on a slower time scale than the rapid exchange of nucleotides in allosteric sites.

      • The results of ATP/dATP copurification experiments shown in Figure 2 - figure supplement 1 show the preference of dATP binding over ATP. However, the results do not necessarily support the competition between ATP and dATP for binding to the ATP cone domain. It is still possible that dATP binding to the s-site diminishes the binding of ATP to the a-site.

      Our aim was to exclude the possibility that ATP and dATP can bind to the ATP-cone at the same time and not to study competition between the two. Nevertheless, to eliminate the possibility that dATP binding to the s-site could affect nucleotide binding to the a-site, in two out of three conditions described in the supplementary figure, the experiments were performed in the presence of dTTP to prevent binding of dATP to the s-site.

      (7) Oligomeric states.

      • The authors must present the GEMMA results without ATP or dATP. Otherwise, the effects of ATP and dATP on the oligomeric state are not clear.

      We cannot report GEMMA results without ATP or dATP because apo-PcNrdD was unstable in the GEMMA buffer and clogged the capillaries. Instead, SEC analysis was performed on apo-PcNrdD in a more suitable buffer and showed a homogeneous peak corresponding to a dimer (included as Figure 3 - figure supplement 1).

      • Figure 3 does not support the induction of a2 upon ATP binding. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding.

      MST and ITC were performed in the presence of substrate (1 mM GTP) and s-site effector (1 mM dTTP in ITC experiments, and 5 mM dTTP in MST experiments), and they thus measure binding of ATP or dATP to the ATP cone. SEC analysis with 2 µM apo-PcNrdD and higher nucleotide concentrations (1 mM) was performed, confirming the presence of both dimers and tetramers in solution at different ratios depending on the addition of ATP or dATP. The SEC analysis, included as Figure 3 - figure supplement 1, confirms the existence of an equilibrium in solution.

      • The effects of dATP must be presented more clearly. The authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. The absence of difference in the oligomeric states between these two different conditions suggested to me that the oligomeric state does not regulate the NrdD activity. The authors seemed to indicate the same conclusion, but did not describe it clearly.

      We agree that the oligomeric state most likely does not regulate the NrdD activity and hope to have explained this better in the revised version.

      • Figure 3 legend mentioned a and b, but the figure was not labeled.

      We have corrected this.

      • The authors should triplicate the analysis and report the errors.

      Five scans were added for each trace to increase the signal-to-noise level (included in figure legend).

      (8) EPR characterization of Gly radical

      • The amount of Gly radical must be quantified by EPR. The authors must report how much NrdD has Gly radical.

      The concentration of NrdD (1 µM) in the activity assays is too low to be quantified by EPR. In the EPR experiment the glycyl radical content is given in the figure legend.

      • The authors claim that the Gly radical environment was similar based on the doublet feature. However, the double feature comes from the hyperfine splitting with α proton whose orientation relative to the radical p-orbital would not be affected by the conformation or the environment. Thus, this conclusion is incorrect and must be removed.

      We thank the reviewer for the clarifying comment and have removed our suggestion in the text.

      (9) Gly711 should be shown in Fig. 6e to help readers understand the last paragraph on page 12.

      The figure reference has been changed to Fig. 7, where this is shown more clearly. In Fig. 6e, inclusion of Gly711 would obscure other important information.

      (10) GRD structure with dATP

      The disorder of GRD in the presence of dATP does not agree with the formation of Gly radical under the same conditions. Gly radical is unlikely stable if it is extensively exposed to solvent. Most likely, the observed cryo-EM structures represent the conformation irrelevant to Gly radical formation.

      We agree that the glycyl radical is unlikely to be stable if exposed to solvent. We believe that the GRD is not completely disordered but most likely made more mobile through rigid body movements of the domain to an extent that makes it invisible in the cryo-EM maps. It is most likely still in the vicinity of the active site, shielding the glycyl radical. Our new HDX-MS results show a small but tangible increase in mobility of the GRD in the presence of dATP compared to ATP. Of course the differences in dynamics remain to be confirmed. It is worth noting that the group of Catherine Drennan at MIT published a conference abstract more than a year ago that suggested a similar pattern of ordered/dynamic GRDs, based on crystal structures, though the details have not yet been published (https://doi.org/10.1096/fasebj.2022.36.S1.R3407).

      We also agree that the cryo-EM structures do not show the GRD conformation relevant to Gly radical formation, as this has been shown spectroscopically for the GRE pyruvate formate lyase to require large conformational changes in the GRD and also the presence of the activase. However, revealing this conformation would be a completely different project. We postulate that inactivation proceeds by prevention of radical transfer to the substrate, not by prevention of its formation.

      We have altered the wording in several places in the revised manuscript, including the title, to avoid using the term “disorder”, as this may imply (partial) unfolding, and we certainly do not wish to imply that.

      (11) The difference between dATP and ATP binding

      From the presented structures, it was not clear how the absence of 2'-OH affects the oligomeric state and the structure of the GRD. The low resolution of the ATP-bound structure precluded the comparison between the ATP and dATP-bound structures.

      We agree that a detailed analysis of the differences between ATP- and dATP-bound structures requires higher resolution structures, particularly of the ATP-bound form. This will be the subject of future studies.

      (12) Conclusion about the disordered GRD.

      -The authors should describe the reason why the dATP binding affected the structure of GRD. The authors did not discuss why dATP binding affected the folding or mobility of GRD. Since this is the key conclusion of this manuscript and the authors are making this conclusion based on the absence of the ordered GRD structure (hence the negative results), the authors should carefully describe why the dATP binding does not allow the binding/folding of GRD in the position observed in the ATP-bound structure.

      As mentioned in our response to point 4 in this reviewer’s Public Review, it is difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker cannot be completely modelled even in the dATP-bound tetramers. Our first hypotheses were that the ATP-cone might work by a steric occlusion mechanism, but the reality appears more complex. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes higher mobility of the GRD, given that all are part of a connected system. The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap. We hope that future structural studies of NrdDs from other organisms may shed further light on this mechanism.

      • The authors should test if the dATP inhibition is reversible for PcNrdD. If dATP binding induces dissociation of GRD from the active site and makes GRD flexible, Gly radical would most likely be quenched by formate or other components in the assay solution. If dATP inhibition is reversible, it is hard to believe that Gly radical dissociates completely from the active site.

      As-purified PcNrdD contains dATP and can after removal of bound nucleotides bind substrate in presence of ATP. The as-purified PcNrdD protein contained 30% nucleotide contamination. After precipitation, HPLC analysis identified a major peak corresponding to dATP/dADP. Purification conditions were optimised to remove the nucleotides and we have added this information to the purification description.

      (13) Functional support for the observed structures.

      Similar to X-ray crystallography, cryo-EM is a highly selective method that requires the selection of particles that can be analyzed with sufficient resolution. This means that the analysis could be biased towards the protein conformations stable on the cryo-EM grid. Consequently, testing the structural observations by functional characterization of mutant enzymes is critical. However, the authors did not perform such functional characterizations and made conclusions purely based on the structural observations.

      We acknowledge this limitation. We constructed several mutations located at the tetrameric interface between the ATP-cone and the core protein based on the cryo-EM structure of dATP loaded NrdD. Unfortunately, these mutant proteins were unstable and led to protein cleavage.

      (14) Other minor points:

      • In the introduction, the authors stated "The presence and function of the ATP-cone domain distinguish anaerobic RNRs from the other members of the large glycyl radical enzyme (GRE) family that are otherwise structurally and mechanistically related (Backman et al., 2017)." This statement is misleading because GREs are functionally diverse.

      We have removed the words “and mechanistically” to reduce ambiguity.

      • p. 12, e.g. should be removed.

      We are not sure what is meant here. Does the reviewer mean p. 21 “The interactions are mostly hydrophobic but are reinforced by several H-bonds, e.g. between Gln3D-Gln458A, Ser53D–Gln458A, Arg11D-Asp468A, the main chain amide of Ile12D and Tyr557A.”?

      Reviewer #3 (Recommendations For The Authors):

      Overall, the work presents an impressive and in-depth structural view of the conformational changes stemming from the interactions of (d)ATP allosteric effector molecules that are interrelated to RNR function. The manuscript is written clearly and provides a solid overview of RNR chemistry. The cryo-EM data show striking differences between ATP and dATP bound forms, though in select regions, the resolution is not good enough for strong interpretations of the finer details.

      (1) In cryo-EM structures, dATP appears to shift the oligomerization equilibrium from nearly all dimeric forms (absence of dATP) to a mixture of both dimeric and tetrameric species (presence of dATP). The examination of the oligomeric composition in solution using the GEMMA - a mass spectral technique - showed somewhat similar trends, though given the magnitude of the differences, it was less compelling. Have the authors considered a complementary solution technique, such as analytical SEC or dynamic light scattering that could provide further support for the change in oligomerization as observed in the cryo-EM?

      SEC analysis with 2 µM apoPcNrdD and higher nucleotide concentrations (1 mM) was performed, confirming the presence of both dimer and tetramer in solution at different ratios depending on the addition of ATP or dATP. The SEC analysis, included as Figure 3 - figure supplement 1, confirms the existence of an equilibrium in solution.

      (2) The protein as isolated from the final SEC shows a predominant peak corresponding to aggregate protein. It would be helpful if the authors ran an analytical SEC on the protein sample that is more refined to see how much soluble dimer/tetramer vs. aggregate protein there is. This could impact the kinetic and thermodynamic analysis of effector interactions. Further, the second major peak is labeled as 'monomer'. Is the protein isolated as a monomer and then forms dimer upon effector binding? It is unclear. The authors should consider presenting the SEC standards for the given column and buffer condition so that a reasonable estimate of the oligomerization status of the isolated protein can be assigned.

      Can the reviewer possibly have believed that Figure 1 - supplementary figure 2a shows PcNrdD rather than PcNrdG? The figure supplement corresponds to the as-isolated SEC analysis of the activase (PcNrdG), which shows the presence of two main peaks of aggregates and monomer. The monomeric peak was reinjected and showed no presence of further aggregation states. Currently it is not known which oligomeric state the activase harbours upon binding to PcNrdD and glycyl radical formation. None of the other SEC figures in the MS has any predominant peak corresponding to aggregated protein.

      (3) More details are needed for the ITC section. The ITC methods are not clear. What is the exact composition of the ligand solution being titrated into the protein solution? It is unclear how the less-than-unity binding stoichiometry was determined and what it means. Is the n value for the monomer, dimer, or tetramer forms? It is concerning that n < 1 is observed for dATP binding in the ITC whereas there are 3 dATP bound/subunit in the cryo-EM. For completeness, titration of a buffer into protein solution (no ligand) should be conducted and presented to demonstrate that the heats produced in Figure 2 correspond to the ligand only (and not a buffer mismatch).

      ITC experiments were performed in the presence of 1 mM GTP (c-site) and 1 mM dTTP (ssite). Unlike other parameters in ITC analyses, the N value is usually the least accurate of all fitted parameters and strongly depends on the concentration of the active protein in the sample. N values described in the current study are in the same range as values reported for ATP-cones in other RNRs and NrdR (Rozman Grinberg & al 2018a, 2018b, 2022 McKethan and Spiro 2013). The results most likely reflect two high-affinity binding sites for dATP and one high affinity binding site for ATP. Different nucleotide concentrations were used in the cryoEM and ITC experiments.

      (4) It is intriguing that the binding of dATP doesn't quell the glycyl radical. In fact, it appears that, as the authors suggest, the amount of glycyl radical might be increased in these samples. However, the cryo-EM data indicates that the GRD is disordered. It is unclear how these would be correlated, as one would not expect a disordered structural element to maintain such a potent oxidant.

      As already written above, we do not wish to imply that the GRD is completely or even highly disordered, just that its dynamics increase in the presence of dATP. Otherwise we completely agree that a very exposed Gly radical is incompatible with its stability. It could be that the amount of disorder is exaggerated somewhat by the vitrification process in cryo-EM. We have tried to reword some of the text to emphasise higher mobility rather than disorder.

      It has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can not be completely modelled even in the dATP-bound tetramers. We initially thought that a steric occlusion mechanism might be at play, but the reality appears more complex. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes higher mobility of the GRD, given that all are part of a connected system. The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap. We hope that future structural studies of NrdDs from other organisms may shed further light on this mechanism.

      (5) It is a bit difficult to keep track of the myriad of structural information and differences amongst the various nucleotide-dependent conditions. It would be useful for the authors to add a summary figure that depicts the various oligomers, orientations, and (dis)ordered structural elements with cartoon representations.

      Thank you for this suggestion. It has been added as Figure 11.

      (6) The mechanism by which (d)ATP binding changes the (dis)ordering of select loops based on the current cryo-EM data is unclear (even the authors agree). The addition of molecular dynamics (MD) simulations on two different structures to reveal the network or structural communication would be a great addition to the work and validate the structural data.

      We have discussed this with a colleague who is an expert in MD. Their advice was that such simulations would be very difficult given that some amino-acids are missing in both of the relevant starting structures (ATP-CTP and dATP-CTP dimer) and could give very variable results. Thus we chose to do complementary experiments with hydrogen-deuterium exchange mass spectrometry (HDX-MS) instead. The results are included in the revised manuscript.

      Minor points

      (1) There are some conflicting reports as to whether P. copri is considered a human 'pathogen'. According to Yeoh, et al Scientific Reports 2022, P. copri is one of the predominant microbes in the human gut and is linked to a positive impact on metabolism. Perhaps the addition of a citation that provides support for it as a pathogen would clarify the statement on p. 3.

      We have added a recent reference (Nii T, Maeda Y, Motooka D, et al. (2023) Genomic repertoires linked with pathogenic potency of arthritogenic Prevotella copri isolated from the gut of patients with rheumatoid arthritis. Ann Rheum Dis 82: 621-629. doi: 10.1136/annrheumdis-2022-222881).

      (2) In Figure 3, the number of dimers/tetramers for dATP (100 uM) does not add up to 100.

      What is the other 2%?

      Thank you for pointing this out - it has been corrected.

      (3) The data in Figures 5C and D do show slight changes that could be fit and interpreted as a 'weak' interaction. Thus, the statement on p 9 "where dATP-loaded PcNrdD could bind neither GTP nor CTP" should be changed to indicate that the interactions are weak (or that the nucleotides weakly associate).

      The text and the figure have been changed according to the reviewer’s suggestion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Firstly, the authors place a great deal of emphasis on the impact of the Hif1-a inhibitor PX-478. The literature surrounding this inhibitor and its mode of action indicates that it is not a direct inhibitor of activity but that its greatest impact is on the production of Hif1-a. The authors do include another inhibitor as a control, Echinomycin, but it does not appear to be as biologically active and the panel of experiments conducted with this is extremely limited. I would be more comfortable with a full Seahorse experimental panel for Echinomycin, similar to SFig 2.G as performed with PX-478.

      We thank the reviewer for their comment highlighting the different mechanisms of action of the HIF-1α inhibitors used in this article. While echinomycin inhibits the binding of HIF-1α to the hypoxia response element (HRE) thereby blocking HIF-1a DNA binding capability, PX-478 inhibits HIF-1α deubiquitination, decreases HIF-1α mRNA expression, and reduces HIF-1α translation. We have included a paragraph explaining this phenomenon in the new version of the manuscript (page 9). In addition, we extended the panel of experiments performed with echinomycin, which confirmed a marked inhibition of the glycolytic pathway when DCs were stimulated with irradiated Mtb in the presence of echinomycin as assessed by SCENITH (new Figure S3H).

      Similarly, it would be of value to have Seahorse profiling that directly excludes FAO from the metabolic profile through the use of Etomoxir as an inhibitor of fatty acid oxidation, which one would assume would have no impact on the metabolic response.

      In order to estimate the contribution of FAO towards fueling protein synthesis in DCs stimulated with iMtb, the FAO inhibitor etomoxir was incorporated to the SCENITH method as previously described (Adamik et al., 2022). Overall, FAO dependence was found to be less than 10% in DCs, regardless of their activation state. While mitochondrial dependence is reduced after iMtb stimulation, there is no difference in FAO dependence, suggesting that OXPHOS is primarily driven by glucose in iMtb-stimulated cells. This is consistent with HIF1α-induced increase of glucose metabolism-related genes. We have adjusted the results section to include this new result (new Figure S1).

      Aside from these minor points, I believe this to be a rigorous study.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 1 and Fig. 2, the authors conclude that Mtb rewires the metabolism of Mo-DCs and induces both glycolysis and OXPHOS. The data shows that infection with iMtb or Mtb increases glucose uptake and lactate release, suggesting an increase in glycolysis. However, an increase in lactate is not a measure of glycolysis. Lactate is a byproduct of glycolysis; the end product of glycolysis is pyruvate.

      We are grateful for the reviewer's comment, as it gives us the opportunity to explain the conceptual framework on which we based our study. Traditionally, pyruvate has been considered to be the end product of glycolysis when oxygen is present and lactate the end product under hypoxic conditions. Numerous studies have shown that lactate is produced even under aerobic conditions (Brooks, 2018). Therefore, we frame this work in accordance with this view that states that glycolysis begins with glucose as its substrate and terminates with the production of lactate as its main end product (Rogatzki, Ferguson, Goodwin, & Gladden, 2015; Schurr, 2023; Schurr & Schurr, 2017).

      Secondly, since the authors have access to the Agilent Extracellular Flux Analyzer, they should have performed detailed ECAR/OCR measurements to conclusively demonstrate that both glycolysis and OXPHOS are increased in Mo. This is especially important for OXPHOS because the only readout shown for OXPHOS is an increase in mitochondrial mass (figure 1 G, H), which is not acceptable. Overall, the data does not indicate that Mtb triggers OXPHOS in the dendritic cells. It only indicates dead iMtb increases the mass of mitochondria in DCs.

      The reviewer’s advice is well appreciated. However, we would like to clarify what may be a misunderstanding; that is, the assays alluded to by the reviewer were not performed on monocytes but on DCs. As advised by the reviewer, we now include the OCR measurements by Seahorse and describe the figures according to their order of appearance in the new version of the manuscript.

      What happens to the mitochondrial mass when infected with live Mtb?

      In response to the reviewer’s question, we determined the mitochondrial mass in infected DCs with live Mtb. In contrast to DCs treated with irradiated Mtb, those infected with live bacteria showed a clear reduction of their mitochondrial mass (modified Figure 1G). This result indicates that, although both Mtb-infected and irradiated Mtb-exposed DCs show a clear increase in their glycolytic activity, divergent responses are observed in terms of mitochondrial mass.

      It will be best if the authors indicate in the figure headings that dead Mtb was used.

      We agree with the reviewer. For figures 1-3, we applied the term “Mtb” in the figure headings since both irradiated and viable bacteria were used for the corresponding experiments. In figures 4-5, the term “iMtb” (alluding to irradiated Mtb) was used in the figure headings as suggested by the reviewer. For the remaining figures, the term “iMtb” was indicated in their legends when dead bacteria weres used to stimulate DCs.

      E.g., Figure 1F; what does live Mtb do to GLUT1 levels etc etc?

      In response to the reviewer’s question, we included new data about Glut1 expression in DCs infected with live Mtb in the latest version of the manuscript. In line with the increase in glucose uptake shown in figure 1B, we observed an increase in the percentage of Glut1 positive DCs upon Mtb infection (new Figure 1F, lower panels). The increase in Glut1 expression strengthens the notion that DCs activates their glycolytic activity in response to the infection, as demonstrated by the elevated release of lactate, glucose consumption, HIF-1α expression, LDHA expression (Figure 1) and glycolytic activity (Figure 2, SCENITH results with viable Mtb). Therefore, these data strongly support the induction of glycolysis by Mtb (either viable or irradiated) in DCs.

      Also, we found that they were still able to activate CD4+ T cells from PPD+ donors in response to iMtb. This activation of CD4 T cells with iMtb in the presence of a HIF-1alpha inhibitor is expected, as iMtb is dead and not virulent. What happens when the cells are infected with live virulent Mtb?

      We would like to clarify the main purpose of the DC-T cells co-culture assays in the presence of the HIF-1α inhibitors. To characterize the impact of HIF-1α on DC functionality, we assessed the capacity of DCs to activate autologous CD4+ T cells when stimulated with iMtb in the presence of HIF-1α inhibitors. To this end, we used iMtb merely as a source of antigens to load DCs and evaluate the effect of HIF-1α inhibition on the activation of antigen-specific T cell. The use of viable Mtb may introduce confounding factors, such as pathogen-triggered inhibitory mechanisms (e.g., EsxH secretion by Mtb, (Portal-Celhay et al., 2016)), which would prevent us from reaching conclusions about the role of HIF-1α. Thus, we consider that the use of live bacteria for this experiment is out of the scope of this manuscript.

      The authors demonstrated that CD16+ monocytes from TB patients have higher glycolytic capacity than healthy controls Fig 7. The authors should differentiate TB patient monocytes into DCs and measure their bioenergetics to test if infection alters their glycolysis and OXPHOS.

      In agreement with the reviewer, the determination of metabolic pathways in DCs differentiated from monocytes of TB patients is a key aspect of this work. Accordingly, the bioenergetic determinations of DCs generated from monocytes from TB patients versus healthy subjects are now illustrated in Figures 6F (lactate release) and 6G (SCENITH profile).

      In the discussion, the authors state that "pathologically active glycolysis in monocytes from TB patients leads to poor glycolytic induction and migratory capacities of monocyte-derived DCs." However, the data from Fig. 1 and 2 show that treatment with iMtb or Mtb induces glycolysis in MoDCs. How do the authors explain these contrasting results?

      We thank the reviewer for pointing out this issue. Figures 1 and 2 show DCs differentiated from monocytes of healthy donors (HS). In this case, DCs from HS respond to Mtb by inducing a glycolytic and migratory profile. Yet, in the case of monocytes isolated from TB patients, these cells exhibit an early glycolytic profile from the beginning of differentiation, ultimately yielding DCs with low glycolytic capacity and low migratory activity in response to Mtb. We included this explanation in the discussion (page 18) to better clarify this issue.

      Also, the term "pathological" active glycolysis (Introduction and Discussion) is an inappropriate term.

      As requested by the reviewer, we excluded the term “pathological” to describe the phenomenon reported in this study.

      Lastly, it should be shown whether the DCs generated from CD16+ monocyte from TB patients generate tolerogenic and/or aberrant DCs, which have lower glycolytic and migration capacity compared to the CD16- monocyte population. In Figure 7B, the authors should discuss why the CD16+ monocyte population has lower glycolytic capacity compared to CD16- monocytes in healthy donors. Furthermore, in contrast to the TB patients, do DCs generated from CD16+ monocyte in healthy donors have increased glycolytic and migration capacity compared to CD16- monocyte (because these monocytes showed lower glycolytic capacity)? Furthermore, if there is no difference in glycolytic capacity among the three monocyte populations in TB patients, on what basis was it concluded that DCs generated only from the CD16+ monocyte population may be the cause of lower migration capacity? The authors state in Figure 7F that the DMOG pretreatment matches the situation where the Mo-DCs from TB patients showed reduced migration. Did the authors check the Hif-1alpha levels in monocytes obtained from TB patients?

      We appreciate this in-depth analysis by the reviewer because it allows us to clarify some interpretations of the SCENITH results in Figure 7B. It is important to keep in mind that with the SCENITH technique we can only infer about the relative contributions between the metabolic pathways, without alluding to the absolute magnitudes of such contributions. In this regard, it is key to note that the amount of lactate released during the first hours of the TB monocyte culture is much higher than that released by monocytes from healthy subjects (HS, Figure 7A), even when most of monocytes, which are CD14+ CD16-, have comparable glycolytic capacities between HS and TB. Another example to illustrate how to interpret SCENITH results can be found in Figure 2, where a lower mitochondrial dependence is observed in iMtb-stimulated DCs (Figure 2A), while the absolute ATP production associated to OXPHOS is indeed higher as measured by Seahorse (Figure 2D). Therefore, the glycolytic capacity is not a direct readout of the magnitude of glycolysis, but of its contribution to total metabolism. The low levels of lactate released from HS monocytes likely reflects their low activation state and low metabolic activity compared to TB monocytes. In this regard, we have previously demonstrated that monocytes from pulmonary TB patients display an activated phenotype (Balboa et al., 2011). The fact that there is no difference between the glycolytic capacities of TB and HS CD16- monocytes indicates that their proportional contributions to protein synthesis are comparable (again, without inferring about their absolute values, which may be very different).

      Beyond the previous clarification, the reviewer's proposal to isolate subsets of monocytes is a very interesting idea. However, the experimental approach is very difficult based on the amount of blood we can obtain from patients. The cohort of patients included in this work comprises very severe patients and we are given up to 15-20 ml of peripheral blood from each. This volume of blood yields up to 10 million PBMC with approximately 1 million monocytes. If we separate the monocyte subsets, the recovered cells per condition will be insufficient to perform the intended assays.

      Nevertheless, we incorporate new evidence that TB disease is associated with an increased activation and glycolytic profile of circulating CD16+ monocytes.

      i) First, we show that the baseline glycolytic capacity of CD16+ monocytes correlates with time since the onset of TB-related symptoms (new Figure 7C).

      ii) Second, we performed high-throughput GeneSet Enrichment Analysis (GSEA) on transcriptomic data (GEO accession number: GSE185372) of CD14+CD16-, CD14+CD16+ and CD14dimCD16+ monocytes isolated from individuals with active TB, latent TB (IGRA+), as well as from TB negative healthy controls (IGRA-). We found enrichments that, unlike oxidative phosphorylation, glycolysis tends to increase in active TB in both CD14+CD16+ and CD14dimCD16+ monocytes (new Figure 7D).

      iii) We measured the expression of HIF-1α in monocyte subsets by FACS and found that this transcription factor is expressed at higher levels in CD16+ monocyte subsets from TB patients compared to their counterparts from healthy donors (new Figure 8 A). We consider this result justifies the assays shown in Figure 8B-C, in which we prematurely activated HIF-1α in healthy donor monocytes during early differentiation to DCs and measured its impact on the migration of the generated DCs.

      In the Discussion, the authors mention that circulating monocytes from TB patients differentiate from DCs with low immunogenic potential. However, the authors have not shown any immunological defect in any of their data with monocytes from TB patients. In the proxy model mentioned in Figure 7, they have in fact shown that these preconditioned DCs have higher CD86 expression. Can the authors explain/show data to justify the statement in the first paragraph of the Discussion?

      We agree with the reviewer on this observation. Our findings are limited to the generation of DCs with low migratory potential (low chemotactic activity towards CCL21 of DC differentiated from TB patient monocytes shown in figure 6H and of DC generated from pre-conditioned monocytes shown in figure 8C). We have modified that part of the discussion to better clarify this point, replacing migratory with immunogenic.

      The authors should note that oxamate is a competitive inhibitor of the enzyme lactate dehydrogenase and not glycolysis. Also, LDHA catalyzes the conversion from pyruvate to lactate and not the other way around (Results, page 6).

      This comment relates to the first one by the reviewer, in which the dogma of glycolysis was discussed. According to the new conception of glycolysis, it begins with glucose as its substrate and terminates with the production of lactate as its main end product.

      The following statements by the authors on page 6 are incorrect: "Because irradiated and viable Mtb induced comparable activation of glycolysis, we subsequently performed all our assays with irradiated Mtb only in the rest of the study due to biosafety reasons." and: "To our knowledge, this is the first study addressing the metabolic status and migratory activity of Mo-DCs from TB patients."

      We deleted the first sentence and reworded the second sentence as "To our knowledge, this is the first study to address how the metabolic status of monocytes from TB patients influences the migratory activity of further differentiated DCs".

      The Discussion reads as if live Mtb was used in the experiments, which is not the case. This should be corrected.

      We changed Mtb for iMtb when it was the case in the discussion. In some cases, Mtb stimulation was used instead of Mtb infection.

      Minor Comments:

      (1) In Figure 1F legend "Quantification of Glut1+ cells plotted to the right". The underlined part should be "plotted below".

      It was corrected.

      (2) In Figure 1H. Please describe the quantitation method and describe how many cells or the number/size of fields were used to quantitate mitochondria.

      For mitochondrial morphometric analysis, TEM images were quantified with the ImageJ “analyze particles” plugin in thresholded images, with size (μm2) settings from 0.001 to infinite. For quantification, 8–10 cells of random fields (1000x magnification) per condition were analyzed. We included this information in the methods section of the new version of the manuscript.

      (3) Please mention the number of independent experimental repeats for each experimental data set and figure.

      In each figure, the number of independent experiments is indicated by individual dots.

      (4) In Figure 2A legend, "PER; left panel" should be PER; lower panel and "OCR; right panel" should be OCR; upper panel.

      It was corrected.

      References for reviewers

      Adamik, J., Munson, P. V., Hartmann, F. J., Combes, A. J., Pierre, P., Krummel, M. F., … Butterfield, L. H. (2022). Distinct metabolic states guide maturation of inflammatory and tolerogenic dendritic cells. Nature Communications 2022 13:1, 13(1), 1–19. https://doi.org/10.1038/s41467-022-32849-1

      Balboa, L., Romero, M. M., Basile, J. I., Sabio y Garcia, C. A., Schierloh, P., Yokobori, N., … Aleman, M. (2011). Paradoxical role of CD16+CCR2+CCR5+ monocytes in tuberculosis: efficient APC in pleural effusion but also mark disease severity in blood. Journal of Leukocyte Biology. https://doi.org/10.1189/jlb.1010577

      Brooks, G. A. (2018). Cell Metabolism The Science and Translation of Lactate Shuttle Theory. Cell Metab. https://doi.org/10.1016/j.cmet.2018.03.008

      Portal-Celhay, C., Tufariello, J. M., Srivastava, S., Zahra, A., Klevorn, T., Grace, P. S., … Philips, J. A. (2016). Mycobacterium tuberculosis EsxH inhibits ESCRT-dependent CD4+ T-cell activation. Nature Microbiology, 2, 16232. https://doi.org/10.1038/NMICROBIOL.2016.232

      Rogatzki, M. J., Ferguson, B. S., Goodwin, M. L., & Gladden, L. B. (2015). Lactate is always the end product of glycolysis. Frontiers in Neuroscience, 9(FEB), 125097. https://doi.org/10.3389/FNINS.2015.00022/BIBTEX

      Schurr, A. (2023). From rags to riches: Lactate ascension as a pivotal metabolite in neuroenergetics. Frontiers in Neuroscience, 17, 1145358. https://doi.org/10.3389/FNINS.2023.1145358/BIBTEX

      Schurr, A., & Schurr, A. (2017). Lactate, Not Pyruvate, Is the End Product of Glucose Metabolism via Glycolysis. Carbohydrate. https://doi.org/10.5772/66699

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you for your continued review and for providing insightful suggestions. Below, I share some unpublished new findings related to the MYRF ChIP, comment on the potential interplay between myrf-1 and myrf-2, and describe the modifications we've implemented to address the reviewers' comments.

      (1) MYRF-1 ChIP

      Our collaboration with the modERN (Model Organism Encyclopedia of Regulatory Networks) project has recently yielded MYRF ChIP data. The results demonstrate clear and consistent MYRF binding across samples, notably on the lin-4 promoter. Given the significant detail and extensive description required to adequately present these findings, we have decided it is impractical to include them in the current paper. These results will be more suitably published in a separate ongoing study focused on MYRF's regulatory targets during larval development.

      (2) Inter-regulation between myrf-1 and myrf-2

      We acknowledge the interpretation that myrf-2 may act as a genetic antagonist to myrf-1, as suggested by the delayed arrest in myrf-1; myrf-2 double mutants and a trend towards increased lin-4 expression in myrf-2 mutants. Additionally, our unpublished data suggest an elevated myrf-2 expression peak in myrf-1 null mutants during the L1-L2 transition, indicating a potential mutual repressive interaction between myrf1 and myrf-2.

      On the other hand, myrf-1 and myrf-2 exhibit functional redundancy in DD synaptic rewiring and lin-4 expression. A gain of function in myrf-2 promotes early DD synaptic rewiring. Furthermore, three independent co-immunoprecipitation analyses targeting myrf-1::gfp, myrf-2::gfp, and pan-1::gfp confirm a tight association between myrf-1 and myrf-2 in vivo. These findings challenge the notion of myrf-2 primarily antagonizing myrf-1, or vice versa.

      We propose a model where myrf-1 and myrf-2 collaborate and are functionally redundant, with compensatory elevated expression when one paralog is absent. For instance, the loss of myrf-1 triggers upregulation of myrf-2, which, though insufficient on its own, accelerates the transcriptional program and exacerbates system deterioration, leading to accelerated death. How exactly this takes place is currently unclear. We notice the MYRF binding on both myrf-1 and myrf-2 genes in MYRF-ChIP.

      Given the complexity of these interactions, we have chosen not to delve deeply into this discussion in the paper without more direct evidence, which would require detailed analysis.

      (3) Revisions Addressing Reviewer Suggestions

      (a) We have revised our interpretation of the mScarlet signal changes in myrf-1(ybq6) and myrf-2(ybq42) mutants to reflect a more nuanced understanding of their potential genetic relationship, as highlighted in the main text.

      “The mScarlet signals exhibit a marked reduction in the putative null mutant myrf-1(ybq6) (Figure 1D, E). Intriguingly, in the putative null myrf-2(ybq42) mutants, there is a noticeable trend towards increased mScarlet signals, although this increase does not reach statistical significance (Figure 2C, D).”

      (b) In response to feedback on Figure 2 and the characterization of lin-4(umn84) mutants, we've included a new series of images showing lin-4(umn84)/+ and lin-4(umn84) signals through larval stages, presented as Figure 2 Figure Supplement 2. This addition clarifies the functional status of lin-4 nulls in our study.

      “Our observations revealed that mScarlet signals were not detected early L1 larvae (Figure 2C-F; Figure 2 Figure Supplement 2).”

      (c) To improve the clarity of Fig 6, we've added indicator arrows in the red, green, and merge channels, enhancing the visualization of the signals.

      We appreciate the opportunity to clarify these points and hope that our revisions and additional data address the concerns raised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the reviewers' and editors' comments and suggestions on our manuscript "Transposable elements regulate thymus development and function." We performed additional analyses to validate our results and rephrased some manuscript sections according to the comments. We believe these changes significantly increase the solidity of our conclusions. Our point-by-point answer to the reviewers' and editors' comments is detailed below. New data and analyses are shown in Figure 1d, Figure 2g and h, Figure 5e and f, Figure 1 – figure supplement 1, Figure 2 – figure supplement 2, Figure 3 – figure supplement 1 and 2, Figure 4 – figure supplement 2, Figure 5 – figure supplement 1, as well as the corresponding text sections.

      Reviewer #1:

      (1) The authors sometimes made overstatements largely due to the lack or shortage of experimental evidence.

      For example in figure 4, the authors concluded that thymic pDCs produced higher copies of TE-derived RNAs to support the constitutive expression of type-I interferons in thymic pDCs, unlike peripheral pDCs. However, the data was showing only the correlation between the distinct TE expression pattern in pDCs and the abundance of dsRNAs. We are compelled to say that the evidence is totally too weak to mention the function of TEs in the production of interferon. Even if pDCs express a distinct type and amount of TE-derived transcripts, it may be a negligible amount compared to the total cellular RNAs. How many TE-derived RNAs potentially form the dsRNAs? Are they over-expressed in pDCs?

      The data interpretation requires more caution to connect the distinct results of transcriptome data to the biological significance.

      We contend that our manuscript combines the attributes of a research article (novel concepts) and a resource article (datasets of TEs implicated in various aspects of thymus function). The critical strength of our work is that it opens entirely novel research perspectives. We are unaware of previous studies on the role of TEs in the human thymus. The drawback is that, as with all novel multi-omic systems biology studies, our work provides a roadmap for a multitude of future mechanistic studies that could not be realized at this stage. Indeed, we performed wet lab experiments to validate some but not all conclusions: i) presentation of TE-derived MAPs by TECs and ii) formation of dsRNAs in thymic pDCs. In response to Reviewer #1, we performed supplementary analyses to increase the robustness of our conclusions. Also, we indicated when conclusions relied strictly on correlative evidence and clarified the hypotheses drawn from our observations.

      Regarding the Reviewer's questions about TE-derived dsRNAs, LINE, LTR, and SINE elements all have the potential to generate dsRNAs, given their highly repetitive nature and bi-directional transcription (1). As ~32% of TE subfamilies are overexpressed in pDCs, we hypothesized that these TE sequences might form dsRNA structures in these cells. To address the Reviewer's concerns regarding the amount of TE-derived RNAs among total cellular RNAs, we also computed the percentage of reads assigned to TEs in the different subsets of thymic APCs (see Reviewer 1 comment #4).

      (2) Lack of generality of specific examples. This manuscript discusses the whole genomic picture of TE expression. In addition, one good way is to focus on the specific example to clearly discuss the biological significance of the acquisition of TEs for the thymic APC functions and the thymic selection.

      In figure 2, the authors focused on ETS-1 and its potential target genes ZNF26 and MTMR3, however, the significance of these genes in NK cell function or development is unclear. The authors should examine and discuss whether the distinct features of TEs can be found among the genomic loci that link to the fundamental function of the thymus, e.g., antigen processing/presentation.

      We thank the Reviewer for this highly relevant comment. We investigated the genomic loci associated with NK cell biology to determine if ETS1 peaks would overlap with TE sequences in protein-coding genes' promoter region. Figure 2h illustrates two examples of ETS1 significant peaks overlapping TE sequences upstream of PRF1 and KLRD1. PRF1 is a protein implicated in NK cell cytotoxicity, whereas KLRD1 (CD94) dimerizes with NKG2 and regulates NK cell activation via interaction with the nonclassical MHC-I molecule HLA-E (2, 3). Thus, we modified the section of the manuscript addressing these results to include these new analyses:

      "Finally, we analyzed publicly available ChIP-seq data of ETS1, an important TF for NK cell development (4), to confirm its ability to bind TE sequences. Indeed, 19% of ETS1 peaks overlap with TE sequences (Figure 2g). Notably, ETS1 peaks overlapped with TE sequences (Figure 2h, in red) in the promoter regions of PRF1 and KLRD1, two genes important for NK cells' effector functions (2, 3)."

      (3) Since the deep analysis of the dataset yielded many intriguing suggestions, why not add a discussion of the biological reasons and significance? For example, in Figure 1, why is TE expression negatively correlated with proliferation? cTEC-TE is mostly postnatal, while mTEC-TE is more embryonic. What does this mean?

      We thank the Reviewer for this comment. To our knowledge, the relationship between cell division and transcriptional activity of TEs has not been extensively studied in the literature. However, a recent study has shown that L1 expression is induced in senescent cells. We therefore added the following sentences to our Discussion:

      "The negative correlation between TE expression and cell cycle scores in the thymus is coherent with recent data showing that transcriptional activity of L1s is increased in senescent cells (5). A potential rationale for this could be to prevent deleterious transposition events during DNA replication and cell division."

      We also added several discussion points regarding the regulation of TEs by KZFPs to answer concerns raised by Reviewer 2 (see Reviewer 2 comment #1).

      (4) To consolidate the experimental evidence about pDCs and TE-derived dsRNAs, one option is to show the amount of TE-derived RNA copies among total RNAs. The immunohistochemistry analysis in figure 4 requires additional data to demonstrate that overlapped staining was not caused by technical biases (e.g. uneven fixation may cause the non-specifically stained regions/cells). To show this, authors should have confirmed not only the positive stainings but also the negative staining (e.g. CD3, etc.). Another possible staining control was showing that non-pDC (CD303- cell fractions in this case) cells were less stained by the ds-RNA probe.

      We thank the Reviewer for this suggestion. We computed the proportion of reads in each cell assigned to two groups of sequences known to generate dsRNAs: TEs and mitochondrial genes (1). These analyses showed that the proportion of reads assigned to TEs is higher in pDCs than other thymic APCs by several orders of magnitude (~20% of all reads). In contrast, reads derived from mitochondrial genes had a lower abundance in pDCs. We included these results in Figure 4 – figure supplement 2 and included the following text in the Results section entitled "TE expression in human pDCs is associated with dsRNA structures":

      "To evaluate if these dsRNAs arise from TE sequences, we analyzed in thymic APC subsets the proportion of the transcriptome assigned to two groups of genomic sequences known as important sources of dsRNAs, TEs and mitochondrial genes (1). Strikingly, whereas the percentage of reads from mitochondrial genes was typically lower in pDCs than in other thymic APCs, the proportion of the transcriptome originating from TEs was higher in pDCs (~22%) by several orders of magnitude (Figure 4 – figure supplement 2)."

      As a negative control for the immunofluorescence experiments, we used CD123- cells. Indeed, flow cytometry analysis of the magnetically enriched CD303+ fraction was around 90% pure, as revealed by double staining with CD123 and CD304 (two additional markers of pDCs): CD123- cells were also CD304-/lo, showing that these cells are non-pDCs. Thus, we decided to compare the dsRNA signal between CD123+ cells (pDCs) and CD123- cells (non-pDCs). The difference between CD123+ and CD123- cells was striking (Figure 4d).

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      It was sometimes difficult for me to recognize the dot plots representing low expression against the white background. e.g., figure 1 supplement 1.

      We thank the Reviewer for their comment, and we modified Figure 1 – figure supplement 1 as well as Figure 3 – figure 3 supplement 2 to improve the contrast between dots and background.

      Reviewer #2:

      Reviewer #2 (Recommendations For The Authors):

      (1) In the abstract, results and discussion, the following conclusions are drawn that are not supported by the data: a) TEs interact with multiple transcription factors in thymic cells, b) TE expression leads to dsRNA formation, activation of RIG-I/MDA5 and secretion of IFN-alpha, c) TEs are regulated by cell proliferation and expression of KZFPs in the thymus. All these statements derive from correlations. Only one TF has ChIP-seq data associated with it, dsRNA formation and/or IFN-alpha secretion could be independent of TE expression, and whilst KZFPs most likely regulate TEs in the thymus, the data do not demonstrate it. The authors also seem to suggests that AIRE, FEZF2 and CHD4 regulate TEs directly, but binding is not shown. The manuscript needs a thorough revision to be absolutely clear about the correlative nature of the described associations.

      We agree with Reviewer #2 that some of the conclusions in our initial manuscript were not fully supported by experimental data. In the revised manuscript, we clearly indicated when conclusions relied strictly on correlative evidence and clarified the hypotheses drawn from our observations. Regarding the regulation of TE expression by AIRE, FEZF2, and CHD4, we reanalyzed publicly available ChIP-seq data of AIRE and FEZF2 in murine mTECs. For AIRE, we confirmed that ~30% of AIRE's statistically significant peaks overlap with TE sequences (see Reviewer 2, comment #6 for more details on read alignment and peak calling), confirming its ability to bind to TE sequences directly. We added these results to the main figures (Figure 5f) and modified the "AIRE, CHD4, and FEZF2 regulate distinct sets of TE sequences in murine mTECs" as follows:

      “[…]. As a proof of concept, we validated that 31.42% of AIRE peaks overlap with TE sequences by reanalyzing ChIP-seq data, confirming AIRE's potential to bind TE sequences (Figure 5f)."

      A reanalysis of FEZF2's ChIP-seq data yielded no significant peaks while using stringent criteria. For this reason, we decided to exclude these data and only use AIRE as a proof of concept.

      Regarding KZFPs, we agree with Reviewer #2 that their impact on TE expression is probably significantly underestimated in our data. A potential reason for this is that KZFP expression is typically low; thus, transcriptomic signals from KZFPs could have been missed by the low depth of scRNA-seq. We mentioned this point in the Discussion:

      "On the other hand, the contribution of KZFPs to TE regulation in the thymus is likely underestimated due to their typically low expression (6) and scRNA-seq's limit of detection."

      (2) On the technical side, there are many dangers about analyzing RNA-seq data at the subfamily level and without stringent quality control checks. Outputs may be greatly confounded by pervasive transcription (see PMID 31425522), DNA contamination, and overlap of TEs with highly expressed genes. Whether TE transcripts are independent units or part of a gene also has important implications for the conclusions drawn. I would say that for most purposes of this work, an analysis restricted to independent TE transcripts, with appropriate controls for DNA contamination, would provide great reassurances that the results from subfamily-level analyses are sound. Showing examples from the genome browser throughout would also help.

      We agree with the Reviewer that contamination could have interfered with TE quantification. We used FastQ Screen (7) to evaluate the contamination of our human scRNA-seq data. As illustrated in the Figure below, most reads aligned with the human genome, and there were no reads uniquely assigned to another species analyzed, confirming the high purity of our dataset.

      Author response image 2.

      As stated by the Reviewer, pervasive expression is another factor that can lead to overestimation of TE expression. To evaluate if pervasive expression impacted the results of our differential expression analysis of TEs between APC subsets, we visualized read alignment to TE sequences using a genome browser. We selected two samples containing the highest numbers of mTEC(II) and pDCs (T07_TH_EPCAM and FCAImmP7277556, respectively) and used STAR to align reads to the human genome (GRCh38). We then visualized read alignment to randomly selected loci of two subfamilies identified as overexpressed by mTEC(II) or pDCs (HERVE-int and Harlequin-int, respectively). The examples below show that the signal detected is specific to the TE sequences located in introns. Even though this visualization cannot guarantee that pervasive expression did not affect TE quantification in any way, it increases the confidence that the signal detected by our analyses genuinely originates from TE expression.

      Author response image 3.

      Author response image 4.

      Author response image 5.

      Author response image 6.

      Author response image 7.

      (3) Related to the above, it would be useful to describe in the main text (and methods) how multi-mapping reads are being handled. It wasn't clear to me how kallisto handles this, and it has implications for the results. In the analysis suggested above, only uniquely mapped reads would have to be used, despite its limitations.

      We agree with the Reviewer that this information regarding assignment of multimapping reads is important. Kallisto uses an expectation-maximization (EM) algorithm to deal with multimapping reads, a strategy used by several algorithms developed to study TE expression (8). Briefly, the EM algorithm reassigns multimapping reads based on the number of uniquely mapped reads assigned to each sequence. Thus, we added the following details to the methods section:

      "Preprocessing of the scRNA-seq data was performed with the kallisto (9), which uses an expectation-maximization algorithm to reassign multimapping reads based on the frequency of unique mappers at each sequence, and bustools workflow."

      (4) Whilst I liked the basic idea, I am not convinced that correlating TE and TF expression is a good strategy for identifying TE-TF associations at enhancers. Enhancers express very low levels of short transcripts, which I doubt would be detected in low-depth scRNA-seq data. The transcripts the authors are using to make such associations may therefore have nothing to do with the enhancer roles of TEs. I would limit these analyses to cell types for which there is histone modification data and correlate TF expression with that instead.

      We agree with the Reviewer that it would have been interesting to correlate the expression of TFs with signals of histone marks at TE sequences. However, we could not perform this analysis because we did not have matched data of histone marks throughout thymic development. Therefore, we adopted an alternative, well-suited strategy.

      Our strategy to identify TE enhancer candidates is depicted in Figure 2a: i) correlation between the expression of the TF and the TE subfamily, ii) presence of the TF binding motif in the sequence of the TE enhancer candidate, and iii) colocalization of the TE enhancer candidate with significant peaks of H3K27ac and H3K4me3 in the same cell type from the ENCODE Consortium ChIP-seq data. We limited our analyses to the eight cell types present both in our dataset and the ENCODE Consortium: B cells, CD4 Single Positive T cells (CD4 SP), CD8 Single Positive T cells (CD8 SP), dendritic cells (DC), monocytes and macrophages (Mono/Macro), NK cells, Th17, and Treg.

      (5) Figure 2G: binding of ETS1 is unconvincing. Were there statistically meaningful peaks called in these regions? It would be good to also show a metaplot/heatmap of ETS1 profile over all elements of relevant subfamilies. Showing histone marks on the genome browser snapshots would also be useful. Is there any transcriptional evidence that the specific Alus shown act as alternative promoters?

      We agree with the Reviewer that the examples provided were not particularly convincing. Thus, we reanalyzed the data to determine if statistically significant ETS1 peaks (see the answer to Reviewer 2's comment #6 for details on the methods) located near gene transcription start sites overlapped with TEs. We thereby provided examples of significant ETS1 peaks overlapping TE sequences in the promoter region of two prototypical NK cell protein-coding genes (Figure 2h).

      (6) Why was -k 10 used with bowtie2? This will map the same read to multiple locations in the genome, increasing read density at more repetitive (younger) TEs. The authors should use either default settings, being clear about the outcome (random assignment of multimapping reads to one location), or use only uniquely aligned reads.

      We thank the Reviewer for their comment and agree that using the -k 10 parameter with bowtie2 was not optimal for TE analysis. To improve the strength of our analyses, we reanalyzed all ChIP-seq data of our manuscript (Figure 2g and h, Figure 5e and f) using the following strategy: alignment with bowtie2 using default parameters except –very-sensitive, multimapping read removal with samtools view -q 10, removal of duplicate reads with samtools markdup -r, peaks calling was performed with macs2 with the -m 5 50 parameter, and peaks overlapping ENCODE's blacklist regions were removed with bedtools intersect.

      These new analyses strengthen our evidence that TEs interact with multiple genes that regulate thymic development and function. We updated the results sections concerning ChIP-seq data analyses and the Methods section to include this information:

      "ChIP-seq reads were aligned to the reference Homo sapiens genome (GRCh38) using bowtie2 (version 2.3.5) (10) with the --very-sensitive parameter. Multimapping reads were removed using the samtools view function with the -q 10 parameter, and duplicate reads were removed using the samtools markdup function with the -r parameter (11). Peak calling was performed with macs2 with the -m 5 50 parameter (12). Peaks overlapping with the ENCODE blacklist regions (13) were removed with bedtools intersect (14) with default parameters. Overlap of ETS1 peaks with TE sequences was determined using bedtools intersect with default parameters. BigWig files were generated using the bamCoverage function of deeptools2 (15), and genomic tracks were visualized in the USCS Genome Browser (16)."

      (7) Figure 1d needs a y axis scale. Could the authors also provide details of how the random distribution of TE expression was generated?

      We agree that the Reviewer that Figure 1d was incomplete and made the appropriate modifications. Regarding the random distribution, we reproduced our dataset containing the expression of 809 TE subfamilies in 18 cell populations. For each combination of TE subfamily and cell type, we randomly assigned an "expression pattern" as identified by the hierarchical clustering of Figure 1b. Then, we computed the maximal occurrence of an expression pattern across cell types for each TE subfamily to generate the distribution curve in Figure 1d. We added the following details to the Methods section to clarify how the random distribution was generated:

      "As a control, a random distribution of the expression of 809 TE subfamilies in 18 cell populations was generated. A cluster (cluster 1, 2, or 3) was randomly attributed for each combination of TE subfamily and cell type, and the maximal occurrence of a given cluster across cell types was then computed for each TE subfamily. Finally, the distributions of LINE, LTR, and SINE elements were compared to the random distribution with Kolmogorov-Smirnov tests."

      (8) The motif analysis requires a minimum of 1 locus from each TE subfamily containing it in order to be reported, but this seems like a really low threshold that will output a lot of noise. What is the rationale here?

      We agree with the Reviewer that this threshold might appear low. Nonetheless, these analyses ultimately aimed to identify TE promoter and enhancer candidates. Hence, we did not want to put an arbitrary threshold at a higher value (e.g., a certain number or percentage of all loci of a given TE subfamily), as this might create a bias based on the total number of loci of a given TE subfamily. Moreover, our rationale was that a TE locus might act as a promoter/enhancer even if it is the only locus of its subfamily containing a TF binding site.

      Even though this strategy might have created some noise in the analyses of interactions between TFs and TEs of Figure 2 (panels a-e), we are confident that our bootstrap strategy efficiently removed low-quality identifications based on low correlations values or expression of TF and TE in low percentages of cells. Additionally, the subsequent analyses on TE promoter and enhancer candidates were performed exclusively for the TE loci containing TF binding sites to avoid adding noise to these analyses.

      (9) Figure 4e: is this a log2 enrichment? If not, the enrichments for some of the gene sets are not so high.

      The enrichment values represented in Figure 4e are not log-transformed. It is essential to highlight that gene set enrichment values were computed for each possible pair of thymic APCs (e.g., pDC vs. cDC1, pDC vs. mTEC(II), etc.), and the values represented in Figure 4e are an average of each comparison pictured at the bottom of the UpSet plot.

      However, we agree with Reviewer 2 that the average enrichment value is not extremely high. We thus made the following modifications to the Results section ("TE expression in human pDCs is associated with dsRNA structures") to better represent it:

      "Notably, thymic pDCs harbored moderate yet significant enrichment of gene signatures of RIG-I and MDA5-mediated IFN ɑ/β signaling compared to all other thymic APCs (Figure 4e and Supplementary file 1 – Table 8)."

      (10) Please be clear on results subtitles when these refer to mouse.

      We apologize for the confusion and modified the subtitles to clarify if the results refer to mouse or human data.

      (11) Figure 1 - figure supplement 2: "assignation" should be 'assignment'.

      We thank the Reviewer for their keen eye and changed the title of Figure 1 – figure supplement 2.

      (1) Sadeq S, Al-Hashimi S, Cusack CM, Werner A. Endogenous Double-Stranded RNA. Noncoding RNA. 2021;7(1).

      (2) Kim N, Kim M, Yun S, Doh J, Greenberg PD, Kim TD, et al. MicroRNA-150 regulates the cytotoxicity of natural killers by targeting perforin-1. J Allergy Clin Immunol. 2014;134(1):195-203.

      (3) Gunturi A, Berg RE, Forman J. The role of CD94/NKG2 in innate and adaptive immunity. Immunol Res. 2004;30(1):29-34.

      (4) Taveirne S, Wahlen S, Van Loocke W, Kiekens L, Persyn E, Van Ammel E, et al. The transcription factor ETS1 is an important regulator of human NK cell development and terminal differentiation. Blood. 2020;136(3):288-98.

      (5) De Cecco M, Ito T, Petrashen AP, Elias AE, Skvir NJ, Criscione SW, et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature. 2019;566(7742):73-8.

      (6) Huntley S, Baggott DM, Hamilton AT, Tran-Gyamfi M, Yang S, Kim J, et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 2006;16(5):669-77.

      (7) Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338.

      (8) Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet. 2020;21(12):721-36.

      (9) Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525-7.

      (10) Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357-9.

      (11) Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2).

      (12) Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.

      (13) Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019;9(1):9354.

      (14) Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841-2.

      (15) Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44(W1):W160-5.

      (16) Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996-1006.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The very detailed insights gained by the authors into allosteric regulation require very specialized techniques in this study. This poses a challenge to communicate the methods, the results, and the meaning of the results to a broader audience. In some places, the authors overcome this challenge better than in others.

      Following this reviewer’s suggestions, we have extensively revised the text, making the text more understandable to a broader audience.

      The manuscript does not show up on BioRxiv.

      The manuscript is now deposited in Biorxv (doi: 10.1101/2023.09.12.557419)

      Fig3: GS-ES2 transition: the changes appear minimal in the illustration.

      As suggested by this reviewer, we have re-examined the GS-ES2 transition and clearly defined the structural characteristics of the conformationally excited state 2 (ES2) state. As shown in the revised Fig.3 of the main text, the ground state (GS) features a π-π packing between the aromatic rings of F100 and Y156, as well as a cation-π stacking between R308 and F102. In the ES2 state, these above interactions are disrupted, while a new π-π packing interaction is formed between F100 and F102. We added new comments in the main text clarifying these structural interactions that characterize each state.

      GS-ES1 transition: how is the K72-E91 salt bridge disrupted? How do you define the formation/disruption of a salt bridge? The current figure does not make this very clear and the K72-E91 salt bridge appears to be intact in ES1. Maybe the authors could replace the dotted K72-E91 line with a dotted line and distance?

      As stated above, we revised Fig. 3 highlighting the differences between the two states. The K72 and E91 salt bridge is formed when the distance between Nε of K72 and Oε of E91 is shorter than 4.0 Å (the typical cutoff for a salt bridge). In the ES1 state, the outward movement of the αC helix increases the distance over 4.5 Å, disrupting the salt bridge.

      L251: Could the authors remind the reader why they are only comparing V104 and I150? Could they give a little context as to why they consider the agreement to be good? It appears that they would be statistically different, so a little context for what comprises a good agreement in the literature may be helpful.

      Our mutagenesis studies show that V104 and I150 are key residues for allosteric communication, and if mutated, result in well-folded but inactive kinases (Sci Adv. doi: 10.1126/sciadv.1600663). Importantly, V104 and I150 show two distinct populations in the CEST experiments that can be directly related to the GS and ES states. Regarding the fitting of these residues, we obtained a good agreement with the direction of the chemical shifts, which supports the hypothesized GS -> ES structural transition. The lack of a quantitative agreement between the chemical shifts of the experimental and simulated excited state is not surprising for two reasons a) all state-of-the art simulations fall short in sampling slow conformational interconversions, and b) the uncertainty of the SHIFTX algorithm for the prediction of 13C chemical shifts of methyl groups is quite large. Finally, we would like to point out that most NMR relaxation-dispersion experiments (CEST and CPMG) are performed for the backbone 15N, 13Calpha and 1H resonances, which have been used to calculate the structures of the intermediate states (Neudecker, P. et. al Science, 2012, 336,doi: 10.1126/science.1214203) and yield reasonable agreement with the prediction for metastable states derived from Markov Models (Olsson, S. J. Am. Chem. Soc., 2017,139,doi:10.1021/jacs.6b09460). To the best of our knowledge, there is no literature reporting on calculations of the 13C CEST profiles for methyl groups from MD simulations, and remarkably, we found a reasonably good agreement between experimental and predicted chemical shifts (see Fig.5C).

      Just to clarify: the calculated CS values are informed by experimental CS values that were used in the calculation?

      We used the backbone chemical shifts as the restraints only in the metadynamics simulations. We used the chemical shifts of the methyl groups and their corresponding excited states to verify the ES2 state.

      Figure 8: in its current form this potentially exciting result is lost on the average reader.

      we modified Fig. 8 of the main text, making the intra- and inter-residue correlations visible to the reader.

      Reviewer #2:

      While the alphaC-beta4 loop is a conserved feature of protein kinases, the residues within this loop vary across various kinase families and groups, enabling group and family-specific control of activity through cis and trans acting elements. F102 in PKA interacts with co-conserved residues in the C-tail, which has been proposed to function as a cis regulatory element. The authors should elaborate on the conformational changes in the C-tail, particularly in the arginine that packs against F102, in the results and discussion. This would further extend the impact and scope of the manuscript, which is currently confined to PKA.

      As suggested by this reviewer, we re-analyzed the time-dependent interactions between F102 and R308 at the C-tail. As this reviewer suspected, these interactions differentiate the ES2 from the GS state. In the GS state, there is a stable cation-π interaction between F102 and R308, which becomes transient in the ES2 state (Fig. 3). For the F100A mutant, the interactions between F102 and R308 have lower occurrence relative to the WT enzyme, i.e., a weaker interaction between the αC-β4 loop and the C-tail (see new Figure 6 - figure supplement 1). The latter supports our conclusion that the structural coupling between the C-tail and the two lobes of the enzyme decreases for the F100A mutant. We added more comments in the main text.

      FAIR standards of making the data accessible and reproducible are not directly addressed.

      We have deposited all our NMR data on the Data Repository Site at the University of Minnesota, DRUM (https://hdl.handle.net/11299/261043).

      The MD data and conformational states would be a valuable resource for the community and should be shared via some open-source repositories.

      Due to the large size of the simulations (>500 GB), we could not deposit them in the Data Repository Site at the University of Minnesota (DRUM). We are actively working with the personnel at DRUM to upload all the trajectories in an alternate site. However, these data will be available to the public immediately upon request.

      The authors state that ES1 and ES2 states are novel and not observed in previous crystal structures. The authors should quantify this through comparisons with PKA inactive states and with other AGC kinases.

      We apologize for the confusion. We now clarify that the ES1 is a well-known inactivation pathway. As suggested by this reviewer, we now report a few examples of active and inactive conformations of PKA-C and other kinases (see new Figure 3 – figure supplement 2.). Briefly, ES1 corresponds to the typical αC-out conformation found for PKA-C bound to inhibitors or in R194A mutant. A similar conformation is present for Src, Abl, and CDK2. The C-out conformation features a disrupted β3K-αCE salt bridge, which is key for active kinases. In contrast, the transition GS-ES2 is not present in the inactive conformations deposited in the PDB.

      Based on the results, can the authors speculate on the impact of oncogenic mutations in the alphaCbeta4 loop mutations in PKA?

      We now include additional comments and another citation that further supports our findings. In short, the activation of a kinase is generated by mutation insertions that stabilize the αC-β4 loop as pointed out by Kannan and Zhang (see references 28, 30, and 68). In contrast, mutations that destabilize this allosteric site (e.g., F100A) are inactivating, disrupting the structural couplings of the two lobes (our work).

      Reviewer #3:

      The manuscript is somewhat difficult to read even for kinase experts, and even harder for the layman. The difficulty partially arises from mixing technical description of the simulations with structural interpretation of the results, which is more intuitive, and partially arises from the assumption that readers are familiar with kinase architecture and its key elements (the aC helix, the APE motif, etc).

      We revised the text and modified Fig. 1 in the main text to make the paper more accessible to the general audience.

      The authors haven't done a good job describing the ES2 state intuitively. From my examination of the figures, it appears that in the ES2 state, the kinase domain is more elongated and the N and the C lobes are relatively less engaged than in the ground state. This may or may not be exactly, but a more intuitive description of the ES2 state is needed.

      As suggested by this reviewer, we include a better description of the ES2 state of the kinase and the structural details of the inactivation pathway. Also, we checked the radius of gyration of the two lobes for GS and ES2. ES2 is slightly more elongated with an Rg of 20.3 ± 0.1 Å as compared to the GS state (20.0 ± 0.2 Å). This marginal difference is consistent with our characterization of the local packing around the C-4 loop, in which the lack of stable interaction with E and C-tail in the ES2 state makes the overall structure less compact.

      The authors need to introduce and give a brief description of technical terms such as CV (collective variable), PC (principal component) etc.

      We now specify both collective variables and principal components and include those definitions in the Method section. Briefly, to characterize the complex conformational transitions of PKA-C, we utilize collective variables (Figure 2 – figure supplement 1). We chose these variables based on structural motifs described in the literature to define local and global structural transitions (Camilloni C., Vendruscolo, M, Biochemistry, 2015,54,7470; Kukic, P. et al. Structure, 2015,23, 745). On the other hand, we utilized the principal component analysis to compare the conformational changes of the kinase in the same two-dimensional space, revealing the two lowest frequencies that define the global motions of the enzyme (Figures 7C, D, and E).

      The following paper should be discussed as it discussed similar ATP/substrate binding of Src kinase based on an extensive network that largely overlaps with the discussed PKA network. Foda, et al. "A dynamically coupled allosteric network underlies binding cooperativity in Src kinase." Nature communications 6.1 (2015): 5939.

      We apologize for missing this citation. Indeed, it makes our finding more general as allosteric cooperativity is key in other kinases such as Src and ERK2. We included this in the Discussion section.

      The CHESCA analysis appears to be an add-on that doesn't add much value. It is difficult to direct. I'd suggest considering removing it to the SI.

      We understand this concern. We rewrote part of the paper to make the NMR analysis of the correlated chemical shifts described by the CHESCA matrices linked to the MD calculations.

    1. Author Response

      Reviewer #1 (Public Review):

      Theoretical principles of viscous fluid mechanics are used here to assess likely mechanisms of transport in the ER. A set of candidate mechanisms is evaluated, making good use of imaging to represent ER network geometries. Evidence is provided that the contraction of peripheral sheets provides a much more credible mechanism than the contraction of individual tubules, junctions, or perinuclear sheets.

      The work has been conducted carefully and comprehensively, making good use of underlying physical principles. There is a good discussion of the role of slip; sensible approximations (low volume fraction, small particle size, slender geometries, pragmatic treatment of boundary conditions) allow tractable and transparent calculations; clear physical arguments provide useful bounds; stochastic and deterministic features of the problem are well integrated.

      We thank the reviewer for their positive assessment of our work.

      There are just a couple of areas where more discussion might be warranted, in my view.

      (1) The energetic cost of tubule contraction is estimated, but I did not see an equivalent estimate for the contraction of peripheral sheets. It might be helpful to estimate the energetic cost of viscous dissipation in generated flows at higher frequencies.

      This is a good point. We will also include an energetic cost estimate for the contractions of peripheral sheets in the revised manuscript.

      The mechanism of peripheral sheet contraction is unclear: do ATP-driven mechanisms somehow interact with thermal fluctuations of membranes?

      The new energetic estimates in the revision might help constrain possible hypotheses for the mechanism(s) driving peripheral sheet contraction, and suggest if a dedicated ATP-driven mechanism is required.

      (2) Mutations are mentioned in the abstract but not (as far as I could see) later in the manuscript. It would be helpful if any consequences for pathologies could be developed in the text.

      We are grateful for this suggestion. The need to rationalise pathology associated with the subtle effects of ER-morphogens’ mutations is indeed pointed out as one factor motivating the study of the interplay between ER structure and performance. In the revised manuscript, we plan to include a brief discussion potentially linking ER morphogenes’ malfunction to luminal transport, integrating additional freshly published data.

      Reviewer #2 (Public Review):

      Summary:

      This study explores theoretically the consequences of structural fluctuations of the endoplasmic reticulum (ER) morphology called contractions on molecular transport. Most of the manuscript consists of the construction of an interesting theoretical flow field (physical model) under various hypothetical assumptions. The computational modeling is followed by some simulations

      Strengths:

      The authors are focusing their attention on testing the hypothesis that a local flow in the tubule could be driven by tubular pinching. We recall that trafficking in the ER is considered to be mostly driven by diffusion at least at a spatial scale that is large enough to account for averaging of any random flow occurring from multiple directions [note that this is not the case for plants].

      We thank the reviewer. We have indeed explored here the possibilities of active transport, focusing especially on transport over the length scale of single tubules, as a result of structural fluctuations, and found tubular pinching to be ineffective compared to e.g. peripheral sheets fluctuations. In the revised version we plan to add text mentioning what is known about the ER in plants.

      Weaknesses:

      The manuscript extensively details the construction of the theoretical model, occupying a significant portion of the manuscript. While this section contains interesting computations, its relevance and utility could be better emphasized, perhaps warranting a reorganization of the manuscript to foreground this critical aspect.

      Overall, the manuscript appears highly technical with limited conclusive insights, particularly lacking predictions confirmed by experimental validation. There is an absence of substantial conclusions regarding molecular trafficking within the ER.

      We sought to balance the theoretical/computational details of our model with the biophysical conclusions drawn from its predictions. Given the model's complexity and novelty, it was essential to elucidate the theoretical underpinnings comprehensively, in order to allow others to implement it in the future with additional, or different, parameters. To maintain clarity and focus in the main text, we have judiciously relegated extensive technical details to the methods section or supplementary materials, and divided the text into stand-alone section headings allowing the reader to skip through to conclusions.

      The primary focus of our manuscript is to introduce and explore, via our theoretical model, the interplay between ER structure dynamics and molecular transport. Our approach, while in silico, generates concrete predictions about the physical processes underpinning luminal motion within the ER. For instance, our findings challenge the previously postulated role of small tubular contractions in driving luminal flow, instead highlighting the potential significance of local flat ER areas—empirically documented entities—for facilitating such motion.

      Furthermore, by deducing what type of transport may or may not occur within the range of possible ER structural fluctuations, our model offers detailed predictions designed to bridge the gap between theoretical insight and experimental verification. These predictions detail the spatial and temporal parameters essential for effective transport, delineating plausible values for these parameters. We hope that the model’s predictions will invite experimentalists to devise innovative methodologies to test them. We plan to introduce text edits to the revised version to clarify these.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      • Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      • Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      RE: We thank the reviewer for acknowledging the contributions and strengths of our work!

      Weaknesses:

      • Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      RE: The novelty of this work is primarily manifested in four key aspects. Firstly, although we have employed several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with predicted structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predicted structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a “Tools and Resources” article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      We have now revised the descriptions in the “The geometry-aware protein binding site predictor (GPSite)” section to highlight the novelty of our work in a clearer manner:

      “In conclusion, GPSite is distinguished from the previous approaches in four key aspects. First, profiting from the effectiveness and low computational cost of ProtTrans and ESMFold, GPSite is liberated from the reliance on MSA and native structures, thus enabling genome-wide binding site prediction. Second, unlike methods that only explore the Cα models of proteins 25,40, GPSite exploits a comprehensive geometric featurizer to fully refine knowledge in the backbone and sidechain atoms. Third, the employed message propagation on residue graphs is global structure-aware and time-efficient compared to the methods based on surface point clouds 21,22, and memory-efficient unlike methods based on full atom graphs 23,24. Residue-based message passing is also less sensitive towards errors in the predicted structures. Last but not least, instead of predicting binding sites for a single molecule type or learning binding patterns separately for different molecules, GPSite applies multi-task learning to better model the latent relationships among different binding partners.”

      • Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      RE: We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets are disparate, it was difficult to compare or rank all these methods fairly using a single test set. Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation, where GPSite achieves a higher AUPR than PeSTo (0.610 against 0.433). This is quite common in this field. For instance, in the study of PeSTo (Nat Commun 2023), the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set.

      Based on the reviewer’s suggestion, we have now replaced this experiment with a direct comparison with PeSTo using the datasets from PeSTo, in order to enhance the completeness and convincingness of our results. The corresponding descriptions are now added in Appendix 1-note 2, and the results are added in Appendix 2-table 4. For convenience, we also attach the note and table here:

      “Since 340 out of 375 proteins in our protein-protein binding site test set share > 30% identity with the training sequences of PeSTo, we performed a separate comparison between GPSite and PeSTo using the training and test datasets from PeSTo. By re-training with simply the same hyperparameters, GPSite achieves better performance than PeSTo (AUPR of 0.824 against 0.797) as shown in Appendix 2-table 4. Furthermore, when using ESMFold-predicted structures as input, the performance of PeSTo decreases substantially (AUPR of 0.691), and the superiority of our method will be further reflected. As in 24, the performance of ScanNet is also included (AUPR of 0.720), which is also largely outperformed by GPSite.”

      Author response table 1.

      Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo 24

      Note: The performance of ScanNet and PeSTo are directly obtained from 24. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

      • Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      RE: We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. Meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed the protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023) directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors.

      In the revised “Benchmark datasets” section, we have now provided more details for the binding site definitions in different datasets to avoid any potential ambiguity:

      “The benchmark datasets for evaluating binding site predictions of DNA, RNA, peptide, ATP, and HEM are constructed from BioLiP”; “A binding residue is defined if the smallest atomic distance between the target residue and the ligand is < 0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms”; “Besides, the benchmark dataset of protein-protein binding sites is directly from 26, which contains non-redundant transient heterodimeric protein complexes dated up to May 2021. Surface regions that become solvent inaccessible on complex formation are defined as the ground truth protein-binding sites. The benchmark datasets of metal ion (Zn2+, Ca2+, Mg2+ and Mn2+) binding sites are directly from 18, which contain non-redundant proteins dated up to December 2021 from BioLiP.”

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      RE: We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we have now carried out the corresponding revisions in the revised manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      • The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      • The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      RE: We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      • Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      RE: We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023), directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we have now improved our manuscript by incorporating additional ablation studies regarding the effects of training procedure and language model representations, as well as case studies regarding the predicted structure’s quality and GPSite-based function annotations. We have also refined the Discussion section to focus more on the achievements of this work. A comprehensive point-by-point response to the reviewer’s recommendations is provided below.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Overall I think the work is slightly deserved by its presentation. Some improvements could be made to the paper to better highlight the significance of your contribution.

      RE: We thank the reviewer for recognizing the significance of our work!

      • Line 188: "As expected, the performance of these methods mostly decreases substantially utilizing predicted structures for testing because they were trained with high-quality native structures.

      This is a major ablation that was not performed in this case. You used the predicted structure to train, while the other did not. One better way to assess the interest of this approach would be to compare the performance of a network trained with only native structure to compare the leap in performance with and without this predicted structure as you did after to assess the interest of some other aspect of your method such as single to multitask.

      RE: We thank the reviewer for the valuable recommendation. We have now assessed the benefit of training with predicted instead of native structures, which brings an average AUPR increase of 4.2% as detailed in Appendix 1-note 5 and Appendix 2-table 9. For convenience, we also attach the note and table here:

      “We examined the performance under different training and evaluation settings as shown in Appendix 2-table 9. As expected, the model yields exceptional performance (average AUPR of 0.656) when trained and evaluated using native structures. However, if this model is fed with predicted structures of the test proteins, the performance substantially declines to an average AUPR of 0.573. This trend aligns with the observations for other structure-based methods as illustrated in Figure 2. More importantly, in the practical scenario where only predicted structures are available for the target proteins, training the model with predicted structures (i.e., GPSite) results in superior performance than training the model with native structures (average AUPR of 0.594 against 0.573), probably owing to the consistency between the training and testing data. For completeness, the results in Appendix 3-figure 2 are also included where GPSite is tested with native structures (average AUPR of 0.637).”

      Author response table 2.

      Performance comparison on the ten binding site test sets under different training and evaluation settings

      Note: The numbers in this table are AUPR values. “Pep” and “Pro” denote peptide and protein, respectively. “Avg” means the average AUPR values among the ten test sets. “native” and “predicted” denote applying native and predicted structures as input, respectively.

      • Line 263: "ProtTrans consistently obtains competitive or superior performance compared to the MSA profiles, particularly for the target proteins with few homologous sequences (Neff < 2)."

      This seems a bit far-fetched. If we see clearly in the figure that the performances are far superior for Neff < 2. The performances seem rather similar for higher Neff. Could the author evaluate numerically the significance of the improvement? MSA profiles outperform GPSite on 4 intervals and I don't know the distribution of the data.

      RE: We thank the reviewer for the valuable suggestion. We have now revised this sentence to avoid any potential ambiguity:

      “As evidenced in Figure 4B and Appendix 2-table 8, ProtTrans consistently obtains competitive or superior performance compared to the MSA profile. Notably, for the target proteins with few homologous sequences (Neff < 2), ProtTrans surpasses MSA profile significantly with an improvement of 3.9% on AUC (P-value = 4.3×10-8).”

      The detailed significance tests and data distribution are now added in Appendix 2-table 8 and attached below as Author response-table 3 for convenience:

      Author response table 3.

      Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the ten ligands

      Note: Significance tests are performed following the procedure in 12,25. If P-value < 0.05, the difference between the performance is considered statistically significant.

      • Line 285: "We first visualized the distributions of residues in this dataset using t-SNE, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite. "

      Wouldn't embedding from single-task be more relevant to show the interest of multi-task training here? Is the difference that big when comparing embeddings from single-task training to embeddings from multi-task training? Otherwise, I think the evidence from Figure 4e is sufficient, the interest of multitasking could be well-shown by single-task vs. multi-task AUPR and a few examples or predictions that are improved.

      RE: We thank the reviewer for the comment. In the second paragraph of the “The effects of protein features and model designs” section, we have compared the performance of multi-task and single-task learning. However, the visualization results in Figure 4D are related to the third paragraph, where we conducted a downstream exploration of the possibility to extend GPSite to other unseen ligands. This is based on the hypothesis that the shared network in GPSite may have captured certain common ligand-binding mechanisms during the preceding multi-task training process. We visualized the distributions of residues in an unseen carbohydrate-binding site dataset using t-SNE, where the residues are encoded by raw feature vectors (ProtTrans and DSSP), or latent embedding vectors from the shared network trained before. Although the shared network has not been specifically trained on the carbohydrate dataset, the latent representations from GPSite effectively improve the discriminability between the binding and non-binding residues as shown in Figure 4D. This finding indicates that the shared network trained on the initial set of ten molecule types has captured common binding mechanisms and may be applied to other unseen ligands.

      We have now added more descriptions in this paragraph to avoid potential ambiguity:

      “Residues that are conserved during evolution, exposed to solvent, or inside a pocket-shaped domain are inclined to participate in ligand binding. During the preceding multi-task training process, the shared network in GPSite should have learned to capture such common binding mechanisms. Here we show how GPSite can be easily extended to the binding site prediction for other unseen ligands by adopting the pre-trained shared network as a feature extractor. We considered a carbohydrate-binding site dataset from 54 which contains 100 proteins for training and 49 for testing. We first visualized the distributions of residues in this dataset using t-SNE 55, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite trained on the ten molecule types previously.”

      • Line291: "Employing these informative hidden embeddings as input features to train a simple MLP exhibits remarkable performance with an AUC of 0.881 (Figure 4E), higher than that of training a single-task version of GPSite from scratch (AUC of 0.853) or other state-of-the-art methods such as MTDsite and SPRINT-CBH."

      Is it necessary to introduce other methods here? The single-task vs multi-task seems enough for what you want to show?

      RE: We thank the reviewer for the comment. As discussed above, here we aim to show the potential of GPSite for the binding site prediction of unseen ligand (i.e., carbohydrate) by adopting the pre-trained shared network as a feature extractor. Thus, we think it’s reasonable to also include the performance of other state-of-the-art methods in this carbohydrate benchmark dataset as baselines.

      • Line 321: "Specifically, a protein-level binding score can be generated for each ligand by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering that the binding interfaces of metal ions are usually smaller."

      Since binding sites are usually not localized on one single amino-acid, we can expect that most of the top k residues are localized around the same area of the protein both spatially and along the sequence. Is it something you observe and could consider in your method?

      RE: We thank the reviewer for the comment. We employed a straightforward method (top-k average) to convert GPSite’s residue-level annotations into protein-level annotations, where k was set empirically based on the distributions of the numbers of binding residues per sequence observed in the training set. We have not put much effort in optimizing this strategy since it mainly serves as a proof-of-concept experiment (Figure 5 A-C) to show the potential of GPSite in discriminating ligand-binding proteins. We have now revised this sentence to better explain how we selected k:

      “Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering the distributions of the numbers of binding residues per sequence observed in the training set.”

      As for the question raised by the reviewer, we can indeed expect that most of the top k predicted binding residues tend to cluster into several but not necessarily one area. For instance, certain macromolecules like DNA may interact with several protein surface patches due to their elongated structures (e.g., Author esponse-figure 1A). Another case may be a protein binding to multiple molecules of the same ligand type (e.g., Author response-figure 1B).

      Author response image 1.

      The structures of 4XQK (A) and 4KYW (B) in PDB.

      • Line 327: The accuracy of the GPSite protein-level binding scores is further validated by the ROC curves in Figure 5B, where GPSite achieves satisfactory AUC values for all ligands except protein (AUC of 0.608).

      Here may be a good place to compare yourself with others, do other frameworks experience the same problem? If so, AUC and AUPR are not relevant here, can you expose some recall scores for example?

      RE: We thank the reviewer for the valuable recommendation. We have conducted comprehensive method comparisons in the preceding “GPSite outperforms state-of-the-art methods” section, where GPSite surpasses all existing frameworks across various ligands. Here, the genome-wide analyses of Swiss-Prot in Figure 5 serve as a downstream demonstration of GPSite’s capacity for large-scale annotations. We didn’t compare with other methods since most of them are time-consuming or memory-consuming, thus unavailable to process sequences of substantial quantity or length. For example, it takes about 8 min for the MSA-based method GraphBind to annotate a protein with 500 residues, while it just takes about 20 s for GPSite (see Appendix 3-figure 1 for detailed runtime comparison). It is also challenging for the atom-graph-based method PeSTo to process structures more than 100 kDa (~1000 residues) on a 32 GB GPU as the authors suggested, while GPSite can easily process structures containing up to 2500 residues on a 16 GB GPU.

      Regarding the recall score mentioned by the reviewer, GPSite achieves a recall of 0.95 (threshold = 0.5) for identifying protein-binding proteins. This indicates that GPSite can accurately identify positive samples, but it also tends to misclassify negative samples as positive. In our original manuscript, we claimed that “This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete”. To better support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note here:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • Line 381: 'Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. Given that the ESM Metagenomic Atlas 34 provides 772 million predicted protein structures along with pre-computed language model embeddings, self-supervised learning can be employed to train a GPSite model for predicting masked sequence and structure attributes, or maximizing the similarity between the learned representations of substructures from identical proteins while minimizing the similarity between those from different proteins using a contrastive loss function training from scratch. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization (EM) framework 58 can be adopted to handle the hierarchical graph structure inherent in proteins, which contains the top view of the residue graph and the bottom view of the atom graph inside a residue. Such an EM procedure enables training two separate graph neural networks for the two views while simultaneously allowing interaction and mutual enhancement between the two modules. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.'

      I think this does not belong here. It feels like half of your discussion is not talking about the achievements of this paper but future very specific directions. Focus on the take-home arguments (performances of the model, ability to predict a large range of tasks, interest in key components of your model, easy use) of the paper and possible future direction but without being so specific.

      RE: We thank the reviewer for the valuable suggestion. We have now simplified the discussions on the future directions notably:

      “Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. GPSite may be improved by pre-training on the abundant predicted structures in ESM Metagenomic Atlas, and then fine-tuning on binding site datasets. Besides, the hidden embeddings from ESMFold may also serve as informative protein representations. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization framework can be adopted to handle the hierarchical atom-to-residue graph structure inherent in proteins. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.”

      • Overall there is also a lack of displayed structure. You should try to select a few examples of binding sites that were identified correctly by your method and not by others, if possible get some insights on why. Also, some negative examples could be interesting so as to have a better idea of the interest.

      RE: We thank the reviewer for the valuable recommendation. We have performed a case study for the structure of the glucocorticoid receptor in Figure 3 D-H to illustrate a potential reason for the robustness of GPSite. Moreover, we have now added a case study in Appendix 1-note 3 and Appendix 3-figure 5 to explain why GPSite sometimes is not as accurate as the state-of-the-art structure-based method. For convenience, we also attach the note and figure here:

      “Here we present an example of an RNA-binding protein, i.e., the ribosome biogenesis protein ERB1 (PDB: 7R6Q, chain m), to illustrate the impact of predicted structure’s quality. As shown in Appendix 3-figure 5, ERB1 is an integral component of a large multimer structure comprising protein and RNA chains (i.e., the state E2 nucleolar 60S ribosome biogenesis intermediate). Likely due to the neglect of interactions from other protein chains, ESMFold fails to predict the correct conformation of the ERB1 chain (TM-score = 0.24). Using this incorrect predicted structure, GPSite achieves an AUPR of 0.580, lower than GraphBind input with the native structure (AUPR = 0.636). However, the performance of GraphBind substantially declines to an AUPR of 0.468 when employing the predicted structure as input. Moreover, if GPSite adopts the native structure for prediction, a notable performance boost can be obtained (AUPR = 0.681).”

      Author response image 2.

      The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1. (A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score = 0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D-G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

      Minor comments:

      • Line 169: "Note that since our test sets may partly overlap with the training sets of these methods, the results reported here should be the upper limits for the existing methods."

      Yes, but they were potentially not trained on the most recent structures in that case. These methods could also see improved performance with an updated training set.

      RE: We thank the reviewer for the comment. We have now deleted this sentence.

      • Line176: "Since 358 of the 375 proteins in our protein-binding site test set share > 30% identity with the training sequences of PeSTo, we re-split our protein-binding dataset to generate a test set of 65 proteins sharing < 30% identity with the training set of PeSTo for a fair evaluation."

      Too specific to be here in my opinion.

      RE: We thank the reviewer for the comment. We have now moved these details to Appendix 1-note 2. The description in the main text here is now more concise:

      “Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we conducted separate training and comparison using the datasets of PeSTo, where GPSite still demonstrates a remarkable improvement over PeSTo (Appendix 1-note 2).”

      • Figure 2. The authors should try to either increase Fig A's size or increase the font size. This could probably be done by compressing the size of Figure C into a single figure.

      RE: We thank the reviewer for the suggestion. We have now increased the font size in Figure A. Besides, the figures in the final version of the manuscript should be clearer where we could upload SVG files.

      • Have you tried using embeddings from more structure-aware pLM such as ESM Fold embeddings (fine-tuned) or ProstTrans (that may be more recent than this study)?

      RE: We thank the reviewer for the insightful comment. We have not yet explored the embeddings from structure-aware pLM, but we acknowledge its potential as a promising avenue for future investigation. We have now added this point in our Discussion section:

      “Besides, the hidden embeddings from ESMFold may also serve as informative protein representations.”

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      • The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      • Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      • GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      • The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      RE: We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      • One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      RE: We thank the reviewer for the valuable suggestion. Empirically, it takes about 5-20 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes about 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we have now included comprehensive runtime comparisons between the GPSite webserver and other top-performing servers in Appendix 3-figure 1 to illustrate the practicality and efficiency of our method. For convenience, we also attach the figure here as Author response-figure 3. The corresponding description is now added in the “GPSite outperforms state-of-the-art methods” section:

      “Moreover, GPSite is computationally efficient, achieving comparable or faster prediction speed compared to other top-performing methods (Appendix 3-figure 1).”

      Author response image 3.

      Runtime comparison of the GPSite webserver with other top-performing servers. Five protein chains (i.e., 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

      • Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      RE: We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 3 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets (e.g., 42 for DNA and 17 for ATP). Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the 104 proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite. We have now added clearer explanations in the “GPSite is robust for low-quality predicted structures” section:

      “Figure 3B and Appendix 3-figure 3 show the distributions of TM-scores between native and predicted structures calculated by US-align in the ten benchmark datasets, where most proteins are accurately predicted with TM-score > 0.7 (see also Appendix 2-table 5)”; “Given the infrequency of low-quality predicted structures except for the RNA test set, we took a closer inspection of the 104 proteins with predicted structures of TM-score < 0.5 in the RNA test set.”

      • To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      RE: We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the local structures of the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions. We have now revised this paragraph to explain these more clearly:

      “Figure 3D shows the structure of the human glucocorticoid receptor (GR), a transcription factor that binds DNA and assembles a coactivator peptide to regulate gene transcription (PDB: 7PRW, chain A). The DNA-binding domain of GR also consists of two C4-type zinc fingers to bind Zn2+ ions. Although the structure of this protein is not perfectly predicted (TM-score = 0.72), the local structures of the binding domains of peptide and DNA are actually predicted accurately as viewed by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can correctly predict all Zn2+ binding sites and precisely identify the binding sites of DNA and peptide with AUPR values of 0.949 and 0.924, respectively (Figure 3F, G and H).”

      • To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      RE: We thank the reviewer for the valuable recommendation. To support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note below:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      RE: We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predicted binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge. We have now revised the corresponding sentence to explain these more clearly:

      “Exploiting the residue-level binding site annotations, we could readily extend GPSite to discriminate between binding and non-binding proteins of various ligands. Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues.”

      As for the cross-interaction-type performance raised by the reviewer, we have now conducted cross-type evaluations to investigate the specificity of the ligand-specific MLPs and the inherent similarities among different ligands in Appendix 1-note 6 and Appendix 2-table 10. For convenience, we also attach the note and table here:

      “We conducted cross-type evaluations by applying different ligand-specific MLPs in GPSite for the test sets of different ligands. As shown in Appendix 2-table 10, for each ligand-binding site test set, the corresponding ligand-specific network consistently achieves the best performance. This indicates that the ligand-specific MLPs have specifically learned the binding patterns of particular molecules. We also noticed that the cross-type performance is reasonable for the ligands sharing similar properties. For instance, the DNA-specific MLP exhibits a reasonable AUPR when predicting RNA-binding sites, and vice versa. Similar trends are also observed between peptide and protein, as well as among metal ions as expected. Interestingly, the cross-type performance between ATP and HEM is also acceptable, potentially attributed to their comparable molecular weights (507.2 and 616.5, respectively).”

      Author response table 4.

      Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands

      Note: “Pep” and “Pro” denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

    1. Author Response

      eLife assessment

      The authors report that optogenetic inhibition of hippocampal axon terminals in retrosplenial cortex impairs the performance of a delayed non-match to place task. The significance of findings elucidating the role of hippocampal projections to the retrosplenial cortex in memory and decision-making behaviors is important. However, the strength of evidence for the paper's claims is currently incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a study on the role of the retrosplenial cortex (RSC) and the hippocampus in working memory. Working memory is a critical cognitive function that allows temporary retention of information for task execution. The RSC, which is functionally and anatomically connected to both primary sensory (especially visual) and higher cognitive areas, plays a key role in integrating spatial-temporal context and in goal-directed behaviors. However, the specific contributions of the RSC and the hippocampus in working memory-guided behaviors are not fully understood due to a lack of studies that experimentally disrupt the connection between these two regions during such behaviors.

      In this study, researchers employed eArch3.0 to silence hippocampal axon terminals in the RSC, aiming to explore the roles of these brain regions in working memory. Experiments were conducted where animals with silenced hippocampal axon terminals in the RSC performed a delayed non-match to place (DNMP) task. The results indicated that this manipulation impaired memory retrieval, leading to decreased performance and quicker decision-making in the animals. Notably, the authors observed that the effects of this impairment persisted beyond the light-activation period of the opsin, affecting up to three subsequent trials. They suggest that disrupting the hippocampal-RSC connection has a significant and lasting impact on working memory performance.

      Strengths:

      They conducted a study exploring the impact of direct hippocampal inputs into the RSC, a region involved in encoding spatial-temporal context and transferring contextual information, on spatial working memory tasks. Utilizing eArch3.0 expressed in hippocampal neurons via the viral vector AAV5-hSyn1-eArch3.0, they aimed to bilaterally silence hippocampal terminals located at the RSC in rats pre-trained in a DNMP task. They discovered that silencing hippocampal terminals in the RSC significantly decreased working memory performance in eArch+ animals, especially during task interleaving sessions (TI) that alternated between trials with and without light delivery. This effect persisted even in non-illuminated trials, indicating a lasting impact beyond the periods of direct manipulation. Additionally, they observed a decreased likelihood of correct responses following TI trials and an increased error rate in eArch+ animals, even after incorrect responses, suggesting an impairment in error-corrective behavior. This contrasted with baseline sessions where no light was delivered, and both eArch+ and control animals showed low error rates.

      Weaknesses:

      While I agree with the authors that the role of hippocampal inputs to the RSC in spatial working memory is understudied and merits further investigation, I find that the optogenetic experiment, a core part of this manuscript that includes viral injections, could be improved. The effects were rather subtle, rendering some of the results barely significant and possibly too weak to support major conclusions.

      We thank Reviewer#1 for carefully and critically reading our manuscript, and for the valuable comments provided. The judged “subtlety” of the effects stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition. We disagree with this perspective and find it rather reductive for several reasons.

      Once seen in the context of the animal’s ecology, subtle impairments can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source.

      Also, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as that of “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of the perturbed factor’s role. If a caricatural analogy is allowed, it would be as if we were to study the role of an animal’s legs by chopping them both off and observing the resulting behaviour.

      In our study we conclude that silencing HIPP inputs in RSC perturbs cognition enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and for our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      Additionally, no mechanistic investigation was conducted beyond referencing previous reports to interpret the core behavioral phenotypes.

      We fully agree with this being a weakness, as we wish we could have done more mechanistic studies to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and perhaps in the future dissect its circuit determinants. We have all these goals very present and hope we can address them soon.

      Reviewer #2 (Public Review):

      The authors examine the impact of optogenetic inhibition of hippocampal axon terminals in the retrosplenial cortex (RSP) during the performance of a working memory T-maze task. Performance on a delayed non-match-to-place task was impaired by such inhibition. The authors also report that inhibition is associated with faster decision-making and that the effects of inhibition can be observed over several subsequent trials. The work seems reasonably well done and the role of hippocampal projections to retrosplenial cortex in memory and decision-making is very relevant to multiple fields. However, the work should be expanded in several ways before one can make firm conclusions on the role of this projection in memory and behavior.

      We thank Reviewer#2 for carefully and critically reading our manuscript, and for the valuable comments provided.

      (1) The work is very singular in its message and the experimentation. Further, the impact of the inhibition on behaviour is very moderate. In this sense, the results do not support the conclusion that the hippocampal projection to retrosplenial cortex is key to working memory in a navigational setting.

      As we have mentioned in response to Reviewer#1, the judged “very moderate” effect stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition, precluding its consideration as “key” for behaviour. We disagree with this perspective and find it rather reductive for several reasons. Once seen in the context of the animal’s ecology, quantitatively lower impairments in working memory are no less key for this cognitive capacity, and can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source. Furthermore, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of its role.

      In our study we conclude that silencing HIPP inputs in RSC perturbs behaviour enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      (2) There are no experiments examining other types of behavior or working memory. Given that the animals used in the studies could be put through a large number of different tasks, this is surprising. There is no control navigational task. There is no working memory test that is non-spatial. Such results should be presented in order to put the main finding in context.

      It is hard to gainsay this point. The more thorough and complete a behavioural characterization is, the more informative is the study, from every angle you look at it. While we agree that other forms of WM would be quite interesting in this context, we also cannot ignore the fact that DNMP is widely tested as a WM task, one that is biologically plausible, sensitive to perturbations of neural circuitry know to be at play therein, and fully accepted in the field. Faced with the impossibility of running further studies, for lack of additional funding and human resources, we chose to run this task.

      A control navigational task would, in our understanding, be used to assess whether silencing HIPP projections to RSC would affect (spatial?) navigation, rather than WM, thus explaining the observed impairment. To this we have the following to say: Spatial Navigation is a very basic cognitive function, one that relies on body orientation relative to spatial context, on keeping an updated representation of such spatial context, (“alas”, as memory), and on guiding behaviour according to acquired knowledge about spatial context. Some of these functions are integral to spatial working memory, as such, they might indeed be affected.

      Dissecting the determinants of spatial WM is indeed an ongoing effort, one that was not the intention of the current study, but also one that we have very present, in hope we can address in the future.

      A non-spatial WM task would indeed vastly solidify our claims beyond spatial WM, onto WM. We have, for this reason, changed the title of the manuscript which now reads “spatial working memory”.

      (3) The actual impact of the inhibition on activity in RSP is not provided. While this may not be strictly necessary, it is relevant that the hippocampal projection to RSP includes, and is perhaps dominated by inhibitory inputs. I wonder why the authors chose to manipulate hippocampal inputs to RSP when the subiculum stands as a much stronger source of afferents to RSP and has been shown to exhibit spatial and directional tuning of activity. The points here are that we cannot be sure what the manipulation is really accomplishing in terms of inhibiting RSP activity (perhaps this explains the moderate impact on behavior) and that the effect of inhibiting hippocampal inputs is not an effective means by which to study how RSP is responsive to inputs that reflect environmental locations.

      We fully agree that neural recordings addressing the effect of silencing on RSC neural activity is relevant. We do wish we could have provided more mechanistic studies, to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and thus dissecting its circuit determinants. We have all these goals very present and hope we can address them soon. Subiculum, which we mention in the Introduction, is indeed a key player in this complex circuitry, one whose hypothetical influence is the subject of experimental studies which will certainly reveal many other key elements.

      (4) The impact of inhibition on trials subsequent to the trial during which optical stimulation was actually supplied seems trivial. The authors themselves point to evidence that activation of the hyperpolarizing proton pump is rather long-lasting in its action. Further, each sample-test trial pairing is independent of the prior or subsequent trials. This finding is presented as a major finding of the work, but would normally be relegated to supplemental data as an expected outcome given the dynamics of the pump when activated.

      We disagree that this finding is “trivial”, and object to the considerations of “normalcy”, which we are left wondering about.

      In lack of neurophysiological experiments (for the reasons stated above) to address this interesting finding, we chose to interpret it in light of (the few) published observations, such being the logical course of action in scientific reporting, given the present circumstances.

      Evidence for such a prolonged effect in the context of behaviour is scarce (to our knowledge only the one we cite in the manuscript). As such, it is highly relevant to report it, and give it the relevance we do in our manuscript, rather than “relegating it to supplementary data”, as the reviewer considers being “normal”.

      In the DNMP task the consecutive sample-test pairs are explicitly not independent, as they are part of the same behavioural session. This is illustrated by the simple phenomenon of learning, namely the intra-session learning curves, and the well-known behavioral trial-history effects. The brain does not simply erase such information during the ITI.

      (5) In the middle of the first paragraph of the discussion, the authors make reference to work showing RSP responses to "contextual information in egocentric and allocentric reference frames". The citations here are clearly deficient. How is the Nitzan 2020 paper at all relevant here?

      Nitzan 2020 reports the propagation of information from HIPP to CTX via SUB and RSC, thus providing a conduit for mnemonic information between the two structures, alternative to the one we target, thus providing thorough information concerning the HIPP-RSC circuitry at play during behaviour.

      Alexander and Nitz 2015 precisely cite the encoding, and conjunction, of two types of contextual information, internal (ego-) and external (allocentric).

      The subsequent reference is indeed superfluous here.

      We thank the Reviewer#2 for calling our attention to the fact that references for this information are inadequate and lacking. We have now cited (Gill et al., 2011; Miller et al., 2019; Vedder et al., 2017) and refer readers to the review (Alexander et al., 2023) for the purpose of illustrating the encoding of information in the two reference frames. In addition, we have substantially edited the Introduction and Discussion sections, and suppressed unnecessary passages.

      (6) The manuscript is deficient in referencing and discussing data from the Smith laboratory that is similar. The discussion reads mainly like a repeat of the results section.

      Please see above. We thank Reviewer#2 for this comment, we have now re-written the Discussion such that it is less of a summary of the Results and more focused on their implications and future directions.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Hats off to the authors for taking time to decipher the seemingly subtle but important differences between the Gnai2/3 double mutant and Ptx mutant phenotypes. These results further illustrate the dynamic requirement of Gnai/0 in hair bundle establishment. I have some minor suggestions for the authors to consider and it is up to the authors to decide whether to incorporate them:

      We decided to make the current (revised) version the version of record, and we explain why below. Please include these comments in the review+rebuttal material.

      (1) The abstract could be modified to reflect the revised interpretations of the results.

      Response: the abstract is high-level and the changes in interpretation in the revised manuscript do not modify the message there. Briefly, the abstract only states that Gnai2; Gnai3 double mutants recapitulate two defects previously only observed with pertussis toxin. There is no claim about the timing or dose of GNAI proteins involved.

      (2) The three rows of OHCs are like a different beast from each other. Mireille Montcouquiol's lab has demonstrated that there is a differential requirement for Gnai3 in hair bundle orientation among the three rows of OHCs. The results described in this manuscript support this notion as well.

      To clarify, Gnai3 inactivation does not affect OHC orientation. Only pertussis toxin, and in this work Gnai2; Gnai3 double mutants, do. The Montcouquiol lab showed different degree of OHC1, OHC2 and OHC3 misorientation upon use of pertussis toxin in vitro using cochlear explants (Ezan et al 2013). We showed the same thing in vivo using transgenic models (Tarchini et al 2013; Tarchini et al 2016). The different OHC responses by row and corresponding citations are mentioned in several locations in the manuscript, including first on line 112 in the Introduction and in Fig. 1C in a graphical summary.

      (3) I wonder if "compensate" or "redundancy" may be a better term to use than "rescue" in the Discussion and figure.

      Use of “rescue” in the Discussion is line 603 and 604. We think that “rescue” is appropriate to refer to the ability of GNAI2 to compensate for the loss of GNAI1 and GNAI3 in mutant context. We would argue that these different wordings are largely interchangeable and do not change the message.


      Author Response

      The following is the authors’ response to the original reviews.

      We really appreciate the time the reviewers spent reading and commenting on the original manuscript. Although they were positive already, we decided to spend some time to address the main comments with new experiments as thoroughly as possible in a new manuscript version. We also heavily edited some sections accordingly.: 1) we delayed pertussis toxin activation in hair cells with Atoh1-Cre to show that the resulting misorientation phenotype is delayed compared to FoxG1-Cre results, as also seen in Gnai2; Gnai3 double mutants. It follows that Gnai2; Gnai3 and pertussis mutants do share a similar misorientation profile, and that GNAI proteins are required to normally reverse OHC1-2 (from medial to lateral), but also to maintain the lateral orientation, at least transiently. 2) We experimentally verified that one of our GNAI antibodies can indeed detect GNAI1, and consequently that absence of signal in Gnai2; Gnai3 double mutants is evidence that GNAI1 is not involved in apical hair cell polarization. We believe these changes strengthen the manuscript and its conclusions.

      Reviewer #1 (Public Review):

      A subclass of inhibitory heterotrimeric guanine nucleotide-binding protein subunits, GNAI, has been implicated in sensory hair cell formation, namely the establishment of hair bundle (stereocilia) orientation and staircase formation. However, the former role of hair bundle orientation has only been demonstrated in mutants expressing pertussis toxin, which blocks all GNAI subunits, but not in mutants with a single knockout of any of the Gnai genes, suggesting that there is a redundancy among various GNAI proteins in this role. Using various conditional mutants, the authors concluded that GNAI3 is the primary GNAI proteins required for hair bundle morphogenesis, whereas hair bundle orientation requires both GNAI2 and GNAI3.

      Strength

      Various compound mutants were generated to decipher the contribution of individual GNAI1, GNAI2, GNAI3 and GNAIO in the establishment of hair bundle orientation and morphogenesis. The study is thorough with detailed quantification of hair bundle orientation and morphogenesis, as well as auditory functions.

      Weakness

      While the hair bundle orientation phenotype in the Foxg1-cre; Gnai2-/-; Gnai3 lox/lox (double mutants) appear more severe than those observed in Ptx cKO mutants, it may be an oversimplification to attribute the differences to more GNAI function in the Ptx cko mutants. The phenotypes between the double mutants and Ptx cko mutants appear qualitatively different. For example, assuming the milder phenotypes in the Ptx cKO is due to incomplete loss of GNAI function, one would expect the Ptx phenotype would be reproducible by some combination of compound mutants among various Gnai genes. Such information was not provided. Furthermore, of all the double mutant specimens analyzed for hair bundle orientation (Fig. 8), the hair bundle/kinocilium position started out normally in the lateral quadrant at E17.5 but failed to be maintained by P0. This does not appear to be the case for Ptx cKO, in which all affected hair cells showed inverted orientation by E17.5. It is not clear whether this is the end-stage of bundle orientation in Ptx cKO, and the kinocilium position started out normal, similar to the double mutants before the age of analysis at E17.5. Understanding these differences may reveal specific requirements of individual GNAI subunits or other factors are being affected in the Ptx mutants.

      This criticism was very useful and prompted new experiments as well as a change in data presentation and a fundamental rewrite regarding hair cell orientation. These changes are detailed below. Of note, however, please let us clarify that the original manuscript did show that the ptxA orientation phenotype is reproduced to some extent in Gnai2; Gnai3 double mutants (previously Fig. 8 and corresponding text line 505). We showed that OHC1-2 are also inverted in the double mutant, although at a later differentiation stage. We recognize that similarities in hair cell misorientation between ptxA and Gnai2; Gnai3 DKO were not explained and discussed well enough. This part of the manuscript has been re-worked extensively, and we hope that along with new results, comparisons between mutant models are easier to follow and understand. We notably fully adopted the idea that there are qualitative differences between ptxA and Gnai2; Gnai3 mutants, and not only a difference in the remaining “dose” of GNAI activity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comments related to clarification of the weakness:

      (1) In general, hair bundle orientation in the double mutants is established in the lateral quadrant of the cochlea before being inverted (Fig. 8). These results are intriguing because the lateral orientation is the correct position for these hair bundles normally and Gnai proteins are thought to be required to get the kinocilium to the lateral position. This process appears to proceed normally in the double mutants but the kinocilium reverted to the medial default position over time, which suggests that Gnai2 and Gnai3 are only required for the maintenance and not the establishment of the kinocilium in the lateral position. Is this phenotype qualitatively similar in the Ptx cKO?

      We addressed these issues with two types of modifications to the data:

      (1) We modified the eccentricity threshold used at E17.5 in Fig. 8 (orientation) to be more stringent, using 0.4 (instead of 0.25 previously) in both controls and mutants. This means that we now only graph the orientation of cells where eccentricity is more marked. The rationale is that at early stages, it is challenging to distinguish immature vs defective near-symmetrical cells. We kept a threshold of 0.25 at P0 when the hair cell apical surface is larger and better differentiated (Fig. 8C-D). Importantly, the dataset remains rigorously identical. This change usefully highlights that a large proportion of OHC1 is in fact inverted (oriented medially) at E17.5 in Gnai2; Gnai3 double mutants at the cochlear mid, as also seen in the ptxA model at the same stage and position (see new Fig. 8A). At the E17.5 base (Fig. 8B), a slightly more mature position, the outcome is unchanged (the majority of OHC1 are inverted using either a 0.25 or 0.4 threshold in double mutants and in ptxA).

      Interestingly however, the orientation trend is unchanged for OHC2: OHC2 remain oriented largely laterally (i.e. normally) at the E17.5 mid and base in Gnai2; Gnai3 double mutants even with a raised eccentricity thresholds, whereas by contrast OHC2 in ptxA are inverted at these stage and positions. In the double mutant, OHC2 only become inverted at the P0 base (Fig. 8D). This suggests that there are similarities (OHC1) but also differences (OHC2s) between the two mouse models, and that double mutants show a delay in adopting an inverted orientation compared to ptxA. Of note, OHC2 have been shown to differentiate later than OHC1 (for example, Anniko 1983 PMID:6869851).

      (2) To directly test the idea that the misorientation phenotype (inverted OHC1-2) is comparable between the two models but delayed in Gnai2; Gnai3 mutants, we performed a new experiment and added new results in the manuscript. We delayed ptxA action by using Atoh1-Cre (postmitotic hair cells) instead of FoxG1-Cre (otic progenitors). Remarkably, this produced a pattern of OHC1-2 misorientation more similar to Gnai2; Gnai3 mutants: at the E17.5 base and P0 apex, OHC2 were still largely oriented laterally (normally) in Atoh1-Cre; ptxA as in Gnai2; Gnai3 mutants whereas at the P0 base a large proportion of OHC2 were inverted (Fig. 8 Supp 1B). OHC1 were inverted at all stages and positions in the Atoh1-Cre as in the FoxG1-Cre; ptxA model. For Atoh1-Cre; ptxA, we only illustrated OHC1 and OHC2 and did not add E17.5 mid or P0 mid results because other cell types and stage/positions did not provide additional insight. In addition, we are well aware that the full FoxG1-Cre; ptxA and Gnai2; Gnai3 results for 4 cells types (IHC, OHC1-3) and 5 stages/positions is already a lot of data for cell orientation.

      These results suggest that:

      (a) The normal reversal of OHC1-2 to adopt a lateral orientation needs to be maintained, at least transiently, and that maintenance also relies on GNAI/O (Results starting line 529. Disussion line 621).

      (b) ptxA is more severe than Gnai2; Gnai3 when it comes to OHC1-2 orientation (Figure 9, role b). Oppositely, Gnai2; Gnai3 is obviously more severe when it comes to symmetry-breaking (Fig. 9, role a) and hair bundle morphogenesis (Fig. 9, c). It follows that the two early GNAI/O activities are qualitatively different and not just based on dose. This is essentially what this Reviewer correctly pointed out, and we have fully edited both Results and Discussion accordingly. We now speculate that the difference may lie in the identity of the necessary GNAI/O protein for each role. Any GNAI/O proteins acting as a switch downstream of the GPR156 receptor may relay orientation information (Fig. 9, role b), making ptxA a particularly effective disruption strategy since it downregulates all GNAI/O proteins. In contrast, symmetry-breaking may rely more specifically on GNAI2 and GNAI3, and ptxA is not expected to achieve a loss-of-function of GNAI2 and GNAI3 as extensive as a double targeted genetic inactivation of the corresponding genes. Please see new Results starting line 526 and Discussion starting line 603. We consequently abandoned the notion that increased doses of GNAI/O is required for each role, and we also clarify that symmetry-breaking (a) and orientation (b) occur at the same time (Fig. 9).

      (2) P0 may not be late enough a stage to access phenotype maturity in the double mutants. For example, it is not clear from the basal PO results whether the IHC will acquire an inverted phenotype or just misorientation in the lateral side.

      For context, the OHC1-2 misorientation pattern in the ptxA model at P0 does represent the end stage, as the same pattern is observed in adults (illustrated in Fig. 2A). In addition, OHC1-2 that express ptxA are inverted as soon as they break planar symmetry, and this was established at E16.5 in a previous publication where ptxA and Gpr156 misorientation patterns were compared and shown to be identical (Kindt et al., 2021 Supp. fig. 5C-D). However, we clearly failed to mention these important results in the original manuscript. We now cite Figure 2 for adult defects (line 522), and provide a citation for OHC1-2 inversion being observed from earliest stage of hair cell differentiation (Kindt et al., 2021) (line 519).

      The vast majority of Gnai2; Gnai3 double mutants die before weaning but the single specimen we managed to collect at P21 also showed inverted OHC1-2 (representative example in Fig. 2A). Again, we previously failed to point out this important result. We now do so line 214 and 555. This is another evidence that OHC1-2 misorientation is in fact similar in the ptxA and Gnai2; Gnai3 models (but milder and delayed in the latter).

      When it comes to IHCs and OHC3s however, the situation is less clear. These cell types are mildly misoriented in ptxA and Gpr156 mutants, but IHCs in particular appear severely misoriented in Gnai2; Gnai3 mutants based on the position of the basal body (Fig. 8). However, very dysmorphic hair bundles can pull on the basal body via the kinocilium and affect its position, which obscures hair cell orientation inferred from the basal body and subsequent interpretations. We do not delve on IHC and OHC3 and their orientation in Gnai2; Gnai3 mutants in the revision since we do not observe similar orientation defects in a different mouse model and lack sufficient adult data.

      Suggestions to improve upon the manuscript for readers:

      (1) Line 294, indicate on the figure the staining in bare zone and tips of stereocilia on row 1.

      Pertains to Figure 4. In A, we now point out the bare zone and stereocilia tips with arrow and arrowheads, respectively (as in other figures).

      (2) Fig.8 schematic diagram, the labels of the line and 90o side by side is misleading.

      We added black ticks for 0, 90, 180, 270 degree references. In contrast, the hair cell angle represented was switched to magenta.

      (3) Fig. 7 legend, redundancy towards the end of the paragraph.

      Thank you for catching this issue. A large portion of the legend was indeed accidentally repeated and is now deleted.

      (4) Line 490-493, Another plausible explanation is that other factors besides Gnai2 and Gnai3 are involved in breaking symmetry during bundle establishment.

      We now acknowledge that other proteins besides GNAI/O may be involved (Discussion line 614). That said, the notion that we do not achieve sufficient and/or early enough GNAI loss is supported for example by the Beer-Hammer 2018 study where no defects in symmetry-breaking or orientation were reported in their Gnai2 flox/flox; Gnai3 flox/flox model (Discussion new Line 637).

      (5) Line 518, the base were largely inverted (Figure 8B). Should Fig 8A be cited instead of 8B?

      Fig. 8B has graphs for the E17.5 cochlear base where OHC1-2 are inverted in both ptxA and Gnai2;3 DKO models. Fig. 8A has graphs of the E17.5 cochlear mid (less differentiated hair cells) where an inversion was not obvious previously, but is now clear although only partial in Gnai2; Gnai3 DKO (see above; raised eccentricity threshold). In the context of the previous text, this citation was thus correct. However, this section has been heavily modified to better compare Gnai2; Gnai3 DKO and ptxA and is hopefully less confusing in the revised version.

      Reviewer #2 (Public Review):

      Jarysta and colleagues set out to define how similar GNAI/O family members contribute to the shape and orientation of stereocilia bundles on auditory hair cells. Previous work demonstrated that loss of particular GNAI proteins, or inhibition of GNAIs by pertussis toxin, caused several defects in hair bundle morphogenesis, but open questions remained which the authors sought to address. Some of these questions include whether all phenotypes resulting from expression of pertussis toxin stemmed from GNAI inhibition; which GNAI family members are most critical for directing bundle development; whether GNAI proteins are needed for basal body movements that contribute to bundle patterning. These questions are important for understanding how tissue is patterned in response to planar cell polarity cues.

      To address questions related to the GNAI family in auditory hair cell development, the authors assembled an impressive and nearly comprehensive collection of mouse models. This approach allowed for each Gnai and Gnao gene to be knocked out individually or in combination with each other. Notably, a new floxed allele was generated for Gnai3 because loss of this gene in combination with Gnai2 deletion was known to be embryonic lethal. Besides these lines, a new knockin mouse was made to conditionally express untagged pertussis toxin following cre induction from a strong promoter. The breadth and complexity involved in generating and collecting these strains makes this study unique, and likely the authoritative last word on which GNAI proteins are needed for which aspect of auditory hair bundle development.

      Appropriate methods were employed by the authors to characterize auditory hair bundle morphology in each mouse line. Conclusions were carefully drawn from the data and largely based on excellent quantitative analysis. The main conclusions are that GNAI3 has the largest effect on hair bundle development. GNAI2 can compensate for GNAI3 loss in early development but incompletely in late development. The Gnai2 Gnai3 double mutant recapitulates nearly all the phenotypic effects associated with pertussis toxin expression and also reveals a role for GNAIs in early movement of the basal body. Although these results are not entirely unexpected based on earlier reports, the current results both uncover new functions and put putative functions on more solid ground.

      Based on this study, loss of GNAI1 and GNAO show a slight shortening of the tallest row of stereocilia but no other significant changes to bundle shape. Antibody staining shows no change in GNAI localization in the Gnai1 knockout, suggesting that little to no protein is found in hair cells. One caveat to this interpretation is that the antibody, while proposed to cross-react with GNAI1, is not clearly shown to immunolabel GNAI1. More than anything, this reservation mostly serves to illustrate how challenging it is to nail down every last detail. In turn, the comprehensive nature of the current study seems all the more impressive.

      (1) The original manuscript quantified stereocilia properties in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants using non-parametric t-tests (Mann-Whitney) for comparisons. This approach indeed suggested subtle reduction in row 1 height in IHCs in all 3 mutants. We did not quantify stereocilia features in Gnao1 mutants but could not observe defects (new Fig. 2 Supp. 1E-F). In fact, we could not observe defects in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants either. For this reason we have been ambivalent about reporting defects for Gnai1 and Gnai2 single and Gnai1; Gnai2 double mutants.

      In the revision, we applied a nested (hierarchical) t-test to avoid pseudo-replication (Eisner 2021; PMID: 33464305; https://pubmed.ncbi.nlm.nih.gov/33464305/). In our data, the nested t-tests structure measurements by animal instead of having all stereocilia or other cell measurements treated as independent values. This more stringent approach no longer finds row 1 height reduction significant in single Gnai1 or Gnai2 mutants, or in Gnai1; Gnai2 double mutants. We modified the text accordingly in Results and Discussion. Nested t-tests were applied uniformly across the manuscript and, besides IHC measurements in Fig. 2, now also apply to bare zone surface area in Fig. 6 and eccentricity in Fig. 7. For these experiments in contrast, previous conclusions are not changed. We think that this more careful statistical treatment is a closer representation of the data in term of the conclusions we can safely make.

      (2) The reviewer's criticism about antibody specificity is accurate and fair, and is fully addressed in the revised manuscript. First, we provide a phylogeny cartoon as Figure 1A to compare the GNAI/O proteins and highlight how closely related they are in sequence. To validate the assumption that our approach would detect GNAI1 if it were present in hair cells, we took a new dual experimental approach in the revision. First, we electroporated Gnai1, Gnai2 and Gnai3 expression constructs in the E13.5 inner ear and tested whether the two GNAI antibodies used in the study can detect ectopic GNAI1 in Kolliker organ. This revealed that “ptGNAI2” detects GNAI1 very well (in addition to GNAI2), but that “scbtGNAI3” does not detect GNAI1 efficiently (although it does detect GNAI3 very well). To verify in vivo that “ptGNAI2” can detect endogenous GNAI1, we immunolabeled the gallbladder epithelium in Gnai1 mutants and littermate controls using the “ptGNAI2” antibody. Based on IMPC consortium data* about the Gnai1 LacZ mouse strain, Gnai1 is specifically expressed in the adult gallbladder. We could verify that signals detected in the Gnai1 mutants were visually reduced in comparison to littermate controls. We now added this validation step in Results line 309 and the data in Fig. 4 Supp. 1A-B).

      *https://www.mousephenotype.org/data/genes/MGI:95771

      Reviewer #2 (Recommendations For The Authors):

      Minor comments that may marginally improve clarity.

      Abstract line 24: delete "nor polarized" because polarization cannot be assessed since the protein is undetectable.

      This is a fair point, now deleted.

      Consider revising: Lines 80-82; 188-202 (the order in which the mutants were presented was hard to follow for me); 239-240.

      Lines 80-82: Used to read as "Ptx recapitulates severe stereocilia stunting and immature-looking hair bundles observed when GPSM2 or both GNAI2 and GNAI3 are inactivated."

      Line 88: Was now changed to "Ptx provokes immature-looking hair bundles with severely stunted stereocilia, mimicking defects in Gpsm2 mutants and Gnai2; Gnai3 double mutants".

      Lines 188-202: This was the first paragraph describing adult stereocilia defects in the different Gnai/o mouse strains. We completely rewrote the entire section to reflect the order in which the strains appear in Figure 2, hopefully making the text easier to follow because it better matches panels in Fig. 2 . We also made several other modifications to streamline comparisons and better introduce the orientation defects that are later detailed at neonate stages.

      Lines 239-240: Used to read "GNAI2 makes a clear contribution since stereocilia defects increase in severity when GNAI loss extends from GNAI3 to both GNAI2 and GNAI3".

      Line 247: Was now changed for "GNAI2 makes a clear contribution since Gnai3neo stereocilia defects dramatically increase in severity when GNAI2 is absent as well in Gnai2; Gnai3 double mutants."

      Line 164: hardwired is unclear. Conserved?

      We modified this sentence as follows: Line 171: "We reasoned that apical HC development is probably highly constrained and less likely to be influenced by genetic heterogeneity compared to susceptibility to disease, for example."

      Line 299: It is not clear why GNAI1 is a better target than GNAI3. This phrase is repeated in line 303, I suspect inadvertently. Is there evidence that this antibody detects GNAI1, perhaps in another tissue? Line 308: GNAI1 may also not be detected by this antibody.

      Please see point 2 above. We removed these hypothetical statements entirely and we instead now experimentally show that one of the two commercial antibodies used can readily detect GNAI1 (yet does not detect signal in hair cells when GNAI2 and GNAI3 are absent in Fig. 4F).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Major Weaknesses:

      The assertion that MOCAT can be rapidly applied in hospital pathology departments seems overstated due to the limited availability of light-sheet microscopes outside research labs. In the first rebuttal letter, authors explain the limitations of other microscopes more readily available in hospitals. This explanation relies on your own investigations and practical experience on the matter, so including them in some part of the manuscript would be beneficial.

      We appreciate the reviewer's comments and have added a discussion on the limitations of microscopes that are more readily available in hospitals in our text:

      Revised manuscript, line 305-316:

      “3.3 Microscopy options for imaging centimeter-sized specimens

      Optical sectioning techniques are crucial for obtaining high-quality volumetric images. Techniques such as confocal microscopes, multi-photon microscopy, and light-sheet microscopy filter out-of-focus signals, resulting in sharp images of individual planes. In our study, we used light-sheet microscopy and multi-point confocal (i.e., spinning disc) for imaging centimeter-sized specimens because of their scanning speeds. While two-photon and confocal microscopy offer high-resolution imaging of smaller volumes, they are not ideal for scanning entire tissues because of their prolonged scanning times.”

      Non-optical sectioning wide-field fluorescence microscopes, like the Olympus BX series or ZEISS Axio imager series, can also be used to scan samples up to about 3.5mm thick with long working distance objective lenses. In these cases, deconvolution algorithms are required to eliminate out-of-focus signals. However, it should be noted that the epifluorescence system might reduce fluorescent intensity in deeper regions within the samples.”

      Refractive index matching is a critical point in the protocol, the one providing final transparency. Authors utilized the commercial solutions NFC1 and NFC2 (Nebulem, Taiwan) with a known refractive index, but for which its composition is non-disclosable. My knowledge on the organic chemistry around refractive index matching is limited, but if users don't really know what is going on in this final step, the whole protocol would rely on a single world-wide provider and troubleshooting would be fishing. I suggest that you try to validate the approach with solutions of known composition, or at least provide the solutions sold by other providers.

      We appreciate the reviewer's suggestions. Based on our experience, the CUBIC-R solution developed by Ueda's team also serves as an effective RI-matching solution in the MOCAT pipeline. Its only drawback is the potential reddening of the specimen, likely due to the light-responsive component, antipyrine. We have now added this information to the Methods section:

      Revised manuscript, line 492-496:

      “Refractive index (RI) matching. Before imaging, the specimens were RI-matched by being immersed in NFC1 (RI = 1.47) and NFC2 (RI = 1.52) solutions (Nebulum, Taipei, Taiwan). Each immersion lasted for one day at room temperature. Alternatively, RI-matching can also be accomplished by immersing specimens in a 1:1 dilution of CUBIC-R[28] for one day, followed by pure CUBIC-R for an additional day.“

      Reviewer #2 (Recommendations For The Authors):

      A comment on the name of the protocol, MOCAT. I am sorry to bring this now, and not before. But, I strongly recommend another name for the procedure. My concern is that the present name "MOCAT" refers to the problem, and NOT to the actual solution provided by you. See, the problem to solve is: to perform Multiplex labeling Of Centimeter-sized Archived Tissue (MOCAT), but it says nothing about HOW you did it: heat-induced antigen retrieval and Tween20-delipidation for centimeter-scale FFPE specimens. In summary, I strongly recommend that the acronym of the procedure refers more to the "solution" than to the "problem", and for me this is important because otherwise the acronym is not fair with present and future techniques pretending to provide a novel solution to the same problem. Another way to put it is that researchers can own their proposed solutions, but they do not own the problem to be solved.

      We appreciate the reviewer's suggestions. In response to their concerns, we have renamed the procedure presented in this study as Heat-Induced FFPE-based Tissue Clearing, with the acronym HIF-Clear. This change reflects the critical step in our procedure. Corresponding updates have also been made in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This manuscript aims to understand the biological mechanisms underlying neuropsychiatric symptoms in Parkinson's disease by characterizing subtypes of neurons in the dorsal raphe nucleus and defining their susceptibility to the degeneration of dopaminergic and adrenergic systems in the brain. This study was well-designed, the results were presented beautifully, and the manuscript was well-written. Here are some comments that may help to improve the overall quality of this work.

      We thank the reviewer for the kind comments.

      Major concerns:

      The current study utilized an intrastriatal 6-OHDA injection, which raises the possibility that the observed electrophysiological and morphological changes of DRN5-HT and DRNDA neurons (Figs 3-6) may be due to the direct effects of 6-OHDA to DRN5-HT and DRNDA neurons projecting to the dorsal striatum (at least for DRN5-HT neurons). This possibility requires further clarification and discussion.

      6-OHDA is a catecholamine neurotoxin with low selectivity for serotonin neurons. However, changes in the levels of serotonin have been observed with high doses of 6OHDA. In our study, we used lower concentrations of 6-OHDA, which did not affect the levels of serotonin (Suppl. Fig 4D), or the number of DRN5-HT neurons (Suppl. Fig. 5B). Concerning the possible effect of 6-OHDA on DRNDA neurons, we did not observe any modification in the number of these cells in response to the administration of 6-OHDA (Suppl. Fig. 5C), (lines 170-175).

      How does the loss of nigrostriatal dopamine neurons affect the electrophysiology and morphology of DRNDA neurons (Figs. 5-6)? What are the potential circuit mechanisms?

      The dopaminergic system in the midbrain and the DRN constitute two highly interconnected nuclei and hence there are multiple possible circuit mechanisms that could explain how loss of nigrostriatal dopaminergic neurons affects DRNDA neurons: First, DRNDA neurons are directly innervated by dopaminergic neurons in the SNc and VTA and hence loss of SNc inputs might evoke acute as well as homeostatic changes in DRNDA (Lin et al., 2020; Pinto et al., 2019). Second, midbrain dopaminergic neurons are in turn innervated by the DRN (Watabe-Uchida et al., 2012) and loss of postsynaptic dopaminergic neurons might affect all neuron types in the DRN that target the midbrain. Finally, GABAergic populations in the midbrain have been shown to target DRN5-HT neurons and might potentially also target other local cell types such as DRNDA (Li et al., 2019). Another possible pathway is the bidirectional connection between the striatum and the DRN (Pollak-Dorocic et al, 2014). DA depletion in the striatum may affect the GABAergic projection to the DRN and in turn modify the properties of postsynaptic DRN neurons.

      The potential circuit mechanisms are now included in the introduction (lines 58-59).

      Whether these intrastriatal 6-OHDA mice exhibited nonmotor deficits (e.g., anxiety) that may be related to the observed changes in the DRN? Such behavioral data would enhance the overall conclusions of this work.

      The PD model utilized in this study displays non-motor deficits, including depression- and anxiety-like behavior (Masini et al. 2021, Ztaou et al., 2018). This is now highlighted in the manuscript (lines 167-169).

      Minor issues:

      The panels of Fig. 2 should be re-labelled to match the descriptions in the main text (L. 142-158).

      Fig.2 now matches the descriptions in the main text.

      Fig 4D was missing from the figure, which does not match the descriptions in the main text (L. 193-204:)

      Fig. 4D includes the parameters describing the dendritic branching and starts with the last graph on the right in the second row of the panel.

      Line 409: Extra "as" after "average"

      Corrected in revised manuscript.

      Fig 3G: Missed asterisks.

      Corrected in revised manuscript (Fig. 3G)

      Details of how action parameters were quantified should be stated and specified in the methods.

      We have now added a section called ‘Quantification of electrophysiological parameters’ in the methods where we explain how the electrophysiological properties are defined and quantified (lines 407-439).

      "Parkinson's disease" in the title should be revised to "parkinsonism"

      Corrected in revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) Throughout the paper, there are numerous inaccuracies and inconsistencies in the figures, which impede the clear understanding of this paper. For example, there are discrepancies between the labeling of the main figures (sub-panels) and the corresponding manuscript (Figure 2, Figure 4).

      Corrected in the revised manuscript.

      The statistical presentations are inaccurate in several figures (Figure 3E, 3G), making it difficult to distinguish which data is statistically meaningful. Furthermore, the number of cells presented in each figure is ambiguous in the figure legend. It would be better to avoid expressions such as 'n = 28 - 43 cells per group', as in line 456 (Figure 1I). Please provide the exact number of cells for each graph.

      We agree with the reviewer, and we have now added the precise n numbers for each panel in the corresponding legends in Fig 1, Fig 3, and Fig 5. Please note that some analysis was restricted to recordings where neurons fired close to their average spontaneous firing frequency (e.g. 1Hz for DRN5-HT) to allow for a fair comparison of the data across groups and that therefore the n numbers vary in different panels.

      In some figures, the value of n in the graph seems different from the value of n in the figure legends (Figure 2G-I, Figure 4, Figure 6). Collectively, these inaccurate figures and the manuscript weaken the general credibility of the data presented.

      We apologize for the misunderstanding, but in the type of chosen graph, equal values are overlapped. The numbers described in the figure legend are correct.

      (2) Some of the authors' claims in this paper are not supported by quantitative analysis, but only by sample recording traces or simple descriptions. For example, in line 97, the authors mentioned, "no differences when comparing TH-positive to TH-negative neurons".

      But there are no data actually analyzing these two groups in Supplementary Figure 2A.

      In addition, in line 103, there is a claim that "DRN DA neurons showed that they share several properties characteristics of other DA populations located in the SNc and the ventral tegmental area". However, this claim is backed up only by a few sample traces in Figure 1E.

      The statement (lines 110-111), "a relative constant action potential (AP) amplitude", is also not supported by appropriate quantitative analysis but only by sample recording traces.

      In our study we found a small subset of DAT-tdTomato positive neurons which did not stain positive for TH after the slice recordings. In 5 of 6 of these neurons (recorded in sham), the electrophysiological properties did not differ from other TH-positive neurons. This is visualized in Suppl. Fig 2A. The absence of any statistical difference was also confirmed by a Mann Whiteny U test comparing the TH negative to the TH positive DRNDA neurons (no significant differences in all 6 of 6 properties shown in Suppl. Fig 2A). Additionally, all these cells were DAT-positive, further supporting their classification as dopaminergic neurons. Therefore, we suspect that the lack of TH staining is likely caused by the tissue processing itself. Please note that all our immunohistochemistry was run on slices after several hours of patch-clamping procedures. Finally, including or excluding this small subset of neurons in the present study does not change any of the results presented and data was therefore pooled. We have now clarified this in more detail in the results section and in Suppl. Fig 2A (lines 100-103).

      We have moved the comparison of hallmark properties found in DRNDA neurons as well as in dopaminergic neurons in the midbrain from the results section to the discussion (lines 281-283).

      The claim that DRN5HT neurons have a comparatively constant action potential amplitude compared to DRNDA neurons is supported by quantitative analysis shown in Fig 1I (left panel, “AP drop rate”), while the representative example traces are shown in Fig 1G.

      (3) In the legend of Figure 2, the mouse used in this experiment is mentioned with two different names (wild-type mice in line 463 and sham-lesion mice in line 465). Is this a mistake? Or did the authors intentionally use the brain samples from sham-lesion mice for Figure 2?

      Figure 2 shows data in control conditions (Sham-lesion in our case), both from wild-type and Dat-Tomato. The text has been changed to avoid misunderstandings.

      (4) While the primary claim of this paper is the differential alterations of DRN 5-HT and DA neurons in a mouse PD model, the observed changes in the DRN neurons of the 'DA only lesion model' are comparatively minor to the 'DA and NA lesions model'. Therefore, it looks like NA depletion has a more critical role in the DRN neurons of 6OHDA-lesion mice than DA depletion. To understand the results of this paper better, it would be great if the authors can provide additional data from the 'NA only lesion model'.

      We agree with the reviewer, and we have now added a new set of experiments in which we selectively lesioned noradrenergic cells by injecting 6-OHDA unilaterally into the LC. The new data are presented in supplementary figure 6 in the revised manuscript. We find that selective lesioning of the NA system affects DRNDA and DRN5-HT neurons mildly, suggesting that the concomitant lesion of the DA and NA systems is particularly impactful (possibly because of interactions between these two systems).

      (5) In Figure 3B and Figure 5B, only the 6-OHDA+DMI group shows significant differences from the sham group. This finding might be attributed to the effect of DMI itself, not to the nigrostriatal DA degeneration without NA degeneration. Thus, adding the 'DMI-only group' in all experiments will strengthen the conclusion of this paper.

      The effect of one acute administration of desipramine was temporally limited to the stereotactic intervention (line 373-375), which was performed several weeks before the electrophysiological and morphological analyses. Given that the half-life of desipramine is approximately 24 hrs (Nagy and Johansson, 1975), we believe that its impact was limited to the neuroprotection of NA-neurons from 6-OHDA toxicity.

      (6) DRN 5-HT neurons are known to exhibit cellular heterogeneity, and in particular their electrophysiological properties are quite heterogeneous (Bernat Kocsis. 2006; J.V. Schweimer. et al. 2011). Furthermore, 5-HT neurons in the distinct subregions of the DRN display different membrane properties (LaTasha K. Crawford, 2010). Therefore, not all DRN 5-HT neurons can be regarded as electrophysiologically identical. Given that the molecular identity of all recorded cells was confirmed with neurobiotin in this paper, it would be better to show that recorded cells are not biased toward certain subregions of DRN.

      In addition, providing more comprehensive descriptions of the electrophysiological features used in PCA analysis would be beneficial in understanding the electrophysiological profiling of DRN neurons explained in this paper.

      Although several studies have revealed electrophysiological and molecular heterogeneity within the DRN5-HT population, we did not observe any significant differences within the DRN5-HT neurons recorded in this study. We compared the properties of DRN5HT neurons recorded more anterior to those recorded in the posterior

      DRN as well as neurons found in more ventral locations to those in more dorsal locations (data not shown). We would like to point out that the largest differences within serotonergic neuron populations described by previous studies were often found when comparing those located in the medial raphe nucleus (MRN) to those found in the DRN. Calizo et al., (2011) showed for example significant differences in the input resistance and AHP amplitude between MRN5HT and DRN5HT neurons. These two properties as well as the AP amplitude, AP threshold, AP duration, and tau did however not differ between DRN subregions in their study - and neither in ours. We extended our Suppl. Fig 1 and mapped the location of DRN5HT and DRNDA neurons recorded in sham (Suppl. Fig 1D).

      Overall, we’ve sampled neurons along the anterior-posterior and dorsal-ventral axes of the DRN, while on the medial-lateral axis, recorded DRN neurons were located medially.

      We agree with the reviewer that a comprehensive description of the electrophysiological features was missing in the manuscript, and we have therefore added a new section in the materials and methods where we explain in detail how each parameter was measured and analyzed (‘Quantification of electrophysiological parameters’, lines 407-439). This section also provides detailed information about the five properties underlying the PCA shown in figure 1 (i.e. delay to the first action potential, action potential drop rate, action potential rise time, duration of the afterhyperpolarization, and capacitance).

      (7) Some sample images presented in this paper contain information that can conflict with the previous research. In Figures 4B and 6B, TH expression was significantly increased in the DMI pretreatment group compared to the control group. However, several studies have shown that the administration of DMI decreases TH expression levels (Komori et al.1992; Nestler et al.1990). Therefore, it would be great if the authors further explained how the pretreatment of DMI with 6-OHDA affects TH level within the DRN.

      Figure 4B and 6B do not show any quantification of TH expression. The difference observed in the representative pictures is casual and due to the variable expression of TH across the slice. Moreover, as mentioned in the response to point 5, mice were subjected to a single injection of DMI immediately preceding the stereotactic intervention (line 373375). In contrast, the increase in TH expression reported by Komori et al. 1992 and Nestler et al. 1990 was observed in response to chronic (two weeks) administration of DMI.

      (8) This paper lacks direct evidence to demonstrate whether DMI pretreatment could effectively protect against NA depletion. Therefore, in addition to TH expression levels, it is important to provide data to confirm the intact NA levels (or NA axons) after DMI treatment.

      NA levels in the striatum were measured by Enzyme-linked immunosorbent assay and reported in Suppl.Fig.4 in the revised manuscript.

      (9) It would be great if the authors specifically explained why 6-OHDA was injected into the striatum (neither MFB nor SNc) to make a mouse model of PD.

      Mice were injected in the dorsal striatum to produce a partial bilateral lesion of the dopamine and noradrenaline systems. This model reproduces the initial stages of PD and also recapitulates several non-motor symptoms of PD, including affective disorders, which may be related to changes in serotonergic and dopaminergic transmission in the dorsal raphe. In contrast, injections in the MFB and SNc quickly produce a severe motor phenotype closer to a late stage of the disease and cannot be done bilaterally. <br /> The striatal model has been successfully used in other publications (Kravitz et al., 2010, Masini et al., 2021, Ztaou et al., 2018, Chen et al., 2014, Branchi et al., 2008, Marques et al. 2019, Tadaiesky et al., 2008, Matheus et al., 2016, Silva et al., 2016).

      (10) Supplementary Figures 2 and 3 were erroneously cut on the right side. These figure images should be replaced with the correct ones.

      We thank the reviewer for noticing and we have now replaced the figures with the correct ones.

      (11) There should be more explanations about tdTomato-positive but non-TH neurons in Supplementary Figure 2. It is strange to regard TH-negative neurons as DA neurons although these neurons have DA neuron-like electrophysiological properties. If these tdTomato-positive but non-TH neurons cannot release DA, can we say these are DA neurons?

      In our study we found a small subset of DAT-tdTomato positive neurons which did not stain positive for TH afterwards. In 5 of 6 of these neurons (recorded in sham), the electrophysiological properties did not differ from other TH-positive neurons. This is visualized in Suppl. Fig 2A. The absence of any statistical difference was also confirmed by a Mann Whiteny U test comparing the TH-negative to the TH-positive DRNDA neurons (no significant differences in all 6 of 6 properties shown in SF2A). Additionally, all these cells were DAT-positive, further supporting their classification as dopaminergic neurons. Therefore, we suspect that the lack of TH staining is likely caused by the tissue processing itself. Please note that all our immunohistochemistry was run on slices after several hours of patch-clamping procedures. Finally, including or excluding this small subset of neurons in the present study does not change any of the results presented and data was therefore pooled. We have now clarified this in more detail in the results section and in Suppl. Fig 2A (lines 100-103).

      Reviewer #3 (Recommendations For The Authors):

      The authors report using a parametric statistical test, the t-test. The t-test makes the assumption that the data are normally distributed. Most biological data is not distributed normally, and with smaller datasets, it is difficult to say whether the underlying distribution would be normally distributed. I would recommend using the non-parametric versions of the same test (eg Mann-Whitney U test), which is likely to give a similar result while being more conservative given the potential for non-normal distribution.

      All electrophysiological data were first tested for normality before running the corresponding statistical test (either t-test for normal distributed data or Mann-Whitney U test for non-normally distributed data). The morphological data are now analyzed by the Mann-Whitney U test (lines 484-494).

      The authors state that mice were treated with 6-OHDA at 3 months, then brain slices were prepared 3 weeks later, making them about 4 months old. I could not find the age of sham/control mice and 6-OHDA/desipramine mice in the methods section. Were sham/controls and 6-OHDA slices prepared in an interleaved fashion?

      Sham and 6-OHDA+DMI mice underwent surgery at 3 months and the brain slices were prepared 3 weeks later, as the 6-OHDA mice. We have now clarified this in the methods (line 381).

      While desipramine is relatively selective as a norepinephrine reuptake inhibitor, it also can prevent serotonin reuptake. Could this mechanism also protect DRN neurons from the effects of 6-OHDA?

      Even if desipramine has some affinity for the serotonin reuptake, this affinity is 100-fold less than the one described for the noradrenaline reuptake (Richelson and Pfenning, 1984, Gillman, 2007). Moreover, in our study the 6-OHDA injection in the dorsal striatum did not cause any direct damage to the DRN5-HT, as shown by the 5-HT measurement and DRN5-HT counting (Suppl. Fig. 4D, Suppl. Fig. 5A,B), so we can exclude that the effects observed in the DMI+6-OHDA group are related to a protection of the serotonergic system exerted by a single injection of desipramine.

      On line 168, the authors use the abbreviation NA for noradrenergic. Was this abbreviation previously defined in the manuscript?

      Yes, the abbreviation is defined in the introduction (line 73).

      On line 45, the authors cite that the DRN-5HT subpopulation accounts for 30-50% of the DRN neurons. It would be helpful to know approximately what percentage of the DRN neurons belong to the DRNDA subpopulation as well.

      To the best of our knowledge, there is unfortunately no detailed analysis of the prevalence of DRNDA neurons in mice available. Previous studies in rats have estimated that this population comprises around 1000 neurons (Descarries et al., 1986). According to Calizo et al. (2011), the number of any non-serotonergic neuron population (releasing dopamine or other neurotransmitters) in the DRN is one third to one tenth less than the number of DRN5-HT neurons. But please note that this study was also performed in rats (line 55).

      While I appreciate that the authors did not over-interpret their findings, it would be useful to comment (in the Discussion) on how their findings could/should be used in interpreting other studies using 6-OHDA, as well as the relationship of their findings to loss of 5-HT and/or DRN neurons in Parkinson's Disease itself.

      In the manuscript, we refer to the utility of the 6-OHDA model for the study of a wide range of non-motor symptoms. We have now described, in this model, how the loss of midbrain dopaminergic and noradrenergic neurons affects the electrophysiological and morphological properties of DRN5-HT and DRNDA neurons. This information will allow for a more precise assessment of the mechanisms involved in the affective and cognitive aspects of PD symptomatology (lines 354-356).

    1. Author Response

      We are writing this response letter with regards to the insightful feedback you provided on our manuscript titled: "A metabolic modeling-based framework for predicting trophic dependencies in native rhizobiomes of crop plants" submitted for consideration in eLife.

      We sincerely appreciate the thorough and constructive reviews, seeing and fitting the intentions behind our work. We intend to fully address all points raised by the reviewers in our revised manuscript. Specifically, we plan to incorporate targeted revisions to address concerns raised during the review process, with focus on process benchmarking and validation of our framework to enhance its reliability and accuracy.

      We believe that the current revision would improve the consistency and quality of the framework, making it a suitable tool for the characterization of microbial trophic interactions in diverse biological landscapes.

      Thank you once again for both your time and dedication in reviewing our manuscript, as well as the constructive review.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      (1) Substantial revision of the claims and interpretation of the results is needed, especially in the setting of additional data showing enhanced erythrophagocytosis with decreased RBC lifespan.

      Thank you for your valuable feedback and suggestion for a substantial revision of the claims and interpretation of our results. We acknowledge the importance of considering additional data that shows enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have revised our manuscript and incorporated additional experimental data to support and clarify our findings.

      (1) In our original manuscript, we reported a decrease in the number of splenic red pulp macrophages (RPMs) and phagocytic erythrocytes after hypobaric hypoxia (HH) exposure. This conclusion was primarily based on our observations of reduced phagocytosis in the spleen.

      (2) Additional experimental data on RBC labeling and erythrophagocytosis:

      • Experiment 1 (RBC labeling and HH exposure)

      We conducted an experiment where RBCs from mice were labeled with PKH67 and injected back into the mice. These mice were then exposed to normal normoxia (NN) or HH for 7 or 14 days. The subsequent assessment of RPMs in the spleen using flow cytometry and immunofluorescence detection revealed a significant decrease in both the population of splenic RPMs (F4/80hiCD11blo, new Figure 5A and C) and PKH67-positive macrophages after HH exposure (as depicted in new Figure 5A and C-E). This finding supports our original claim of reduced phagocytosis under HH conditions.

      Author response image 1.

      -Experiment 2 (erythrophagocytosis enhancement)

      To examine the effects of enhanced erythrophagocytosis, we injected Tuftsin after administering PKH67-labelled RBCs. Our observations showed a significant decrease in PKH67 fluorescence in the spleen, particularly after Tuftsin injection compared to the NN group. This result suggests a reduction in RBC lifespan when erythrophagocytosis is enhanced (illustrated in new Figure 7, A-B).

      Author response image 2.

      (3) Revised conclusions:

      • The additional data from these experiments support our original findings by providing a more comprehensive view of the impact of HH exposure on splenic erythrophagocytosis.

      • The decrease in phagocytic RPMs and phagocytic erythrocytes after HH exposure, along with the observed decrease in RBC lifespan following enhanced erythrophagocytosis, collectively suggest a more complex interplay between hypoxia, erythrophagocytosis, and RBC lifespan than initially interpreted.

      We think that these revisions and additional experimental data provide a more robust and detailed understanding of the effects of HH on splenic erythrophagocytosis and RBCs lifespan. We hope that these changes adequately address the concerns raised and strengthen the conclusions drawn in our manuscript.

      (2) F4/80 high; CD11b low are true RPMs which the cells which the authors are presenting, i.e. splenic monocytes / pre-RPMs. To discuss RPM function requires the presentation of these cells specifically rather than general cells in the proper area of the spleen.

      Thank you for your feedback requesting a substantial revision of our claims and interpretation, particularly considering additional data showing enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have thoroughly revised our manuscript and included new experimental data that further elucidate the effects of HH on RPMs and erythrophagocytosis.

      (1) Re-evaluation of RPMs population after HH exposure:

      • Flow cytometry analysis (new Figure 3G, Figure 5A and B): We revisited the analysis of RPMs (F4/80hiCD11blo) in the spleen after 7 and 14 days of HH exposure. Our revised flow cytometry data consistently showed a significant decrease in the RPMs population post-HH exposure, reinforcing our initial findings.

      Author response image 3.

      Author response image 4.

      • In situ expression of RPMs (Figure S1, A-D):

      We further confirmed the decreased population of RPMs through in situ co-staining with F4/80 and CD11b, and F4/80 and CD68, in spleen tissues. These results clearly demonstrated a significant reduction in F4/80hiCD11blo (Figure S1, A and B) and F4/80hiCD68hi (Figure S1, C and D) cells following HH exposure.

      Author response image 5.

      (2) Single-cell sequencing analysis of splenic RPMs:

      • We conducted a single-cell sequencing analysis of spleen samples post 7 days of HH exposure (Figure S2, A-C). This analysis revealed a notable shift in the distribution of RPMs, predominantly associated with Cluster 0 under NN conditions, to a reduced presence in this cluster after HH exposure.

      • Pseudo-time series analysis indicated a transition pattern change in spleen RPMs, with a shift from Cluster 2 and Cluster 1 towards Cluster 0 under NN conditions, and a reverse transition following HH exposure (Figure S2, B and D). This finding implies a decrease in resident RPMs in the spleen under HH conditions.

      (3) Consolidated findings and revised interpretation:

      • The comprehensive analysis of flow cytometry, in situ staining, and single-cell sequencing data consistently indicates a significant reduction in the number of RPMs following HH exposure.

      • These findings, taken together, strongly support the revised conclusion that HH exposure leads to a decrease in RPMs in the spleen, which in turn may affect erythrophagocytosis and RBC lifespan.

      Author response image 6.

      In conclusion, our revised manuscript now includes additional experimental data and analyses, strengthening our claims and providing a more nuanced interpretation of the impact of HH on spleen RPMs and related erythrophagocytosis processes. We believe these revisions and additional data address your concerns and enhance the scientific validity of our study.

      (3) RBC retention in the spleen should be measured anyway quantitatively, eg, with proper flow cytometry, to determine whether it is increased or decreased.

      Thank you for your query regarding the quantitative measurement of RBC retention in the spleen, particularly in relation to HH exposure. We have utilized a combination of techniques, including flow cytometry and histological staining, to investigate this aspect comprehensively. Below is a summary of our findings and methodology.

      (1) Flow cytometry analysis of labeled RBCs:

      • Our study employed both NHS-biotin (new Figure 4, A-D) and PKH67 labeling (new Figure 4, E-H) to track RBCs in mice exposed to HH. Flow cytometry results from these experiments (new Figure 4, A-H) showed a decrease in the proportion of labeled RBCs over time, both in the blood and spleen. Notably, there was a significantly greater reduction in the amplitude of fluorescently labeled RBCs after NN exposure compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under HH exposure. The observed decrease in labeled RBCs was initially counterintuitive, as we expected an increase in RBC retention due to reduced erythrophagocytosis. However, this decrease can be attributed to the significantly increased production of RBCs following HH exposure, diluting the proportion of labeled cells.

      • Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure. These findings suggest that HH exposure leads to a decrease in the clearance rate of RBCs.

      Author response image 7.

      (2) Detection of erythrophagocytosis in spleen:

      To assess erythrophagocytosis directly, we labeled RBCs with PKH67 and analyzed their uptake by splenic macrophages (F4/80hi) after HH exposure. Our findings (new Figure 5, D-E) indicated a decrease in PKH67-positive macrophages in the spleen, suggesting reduced erythrophagocytosis.

      Author response image 8.

      (3) Flow cytometry analysis of RBC retention:

      Our flow cytometry analysis revealed a decrease in PKH67-positive RBCs in both blood and spleen (Figure S4). We postulated that this was due to increased RBC production after HH exposure. However, this method might not accurately reflect RBC retention, as it measures the proportion of PKH67-labeled RBCs relative to the total number of RBCs, which increased after HH exposure.

      Author response image 9.

      (4) Histological and immunostaining analysis:

      Histological examination using HE staining and band3 immunostaining in situ (new Figure 6, A-D, and G-H) revealed a significant increase in RBC numbers in the spleen after HH exposure. This was further confirmed by detecting retained RBCs in splenic single cells using Wright-Giemsa composite stain (new Figure 6, E and F) and retained PKH67-labelled RBCs in spleen (new Figure 6, I and J).

      Author response image 10.

      (5) Interpreting the data:

      The comprehensive analysis suggests a complex interplay between increased RBC production and decreased erythrophagocytosis in the spleen following HH exposure. While flow cytometry indicated a decrease in the proportion of labeled RBCs, histological and immunostaining analyses demonstrated an actual increase in RBCs retention in the spleen. These findings collectively suggest that while the overall RBCs production is upregulated following HH exposure, the spleen's capacity for erythrophagocytosis is concurrently diminished, leading to increased RBCs retention.

      (6) Conclusion:

      Taken together, our results indicate a significant increase in RBCs retention in the spleen post-HH exposure, likely due to reduced residual RPMs and erythrophagocytosis. This conclusion is supported by a combination of flow cytometry, histological staining, and immunostaining techniques, providing a comprehensive view of RBC dynamics under HH conditions. We think these findings offer a clear quantitative measure of RBC retention in the spleen, addressing the concerns raised in your question.

      (4) Numerous other methodological problems as listed below.

      We appreciate your question, which highlights the importance of using multiple analytical approaches to understand complex physiological processes. Please find below our point-by-point response to the methodological comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) Decreased BM and spleen monocytes d/t increased liver monocyte migration is unclear. there is no evidence that this happens or why it would be a reasonable hypothesis, even in splenectomized mice.

      Thank you for highlighting the need for further clarification and justification of our hypothesized decrease in BM and spleen monocytes due to increased monocyte migration to the liver, particularly in the context of splenectomized mice. Indeed, our study has not explicitly verified an augmentation in mononuclear cell migration to the liver in splenectomized mice.

      Nonetheless, our investigations have revealed a notable increase in monocyte migration to the liver after HH exposure. Noteworthy is our discovery of a significant upregulation in colony stimulating factor-1 (CSF-1) expression in the liver, observed after both 7 and 14 days of HH exposure (data not included). This observation was substantiated through flow cytometry analysis (as depicted in Figure S4), which affirmed an enhanced migration of monocytes to the liver. Specifically, we noted a considerable increase in the population of transient macrophages, monocytes, and Kupffer cells in the liver following HH exposure.

      Author response image 11.

      Considering these findings, we hypothesize that hypoxic conditions may activate a compensatory mechanism that directs monocytes towards the liver, potentially linked to the liver’s integral role in the systemic immune response. In accordance with these insights, we intend to revise our manuscript to reflect the speculative nature of this hypothesis more accurately, and to delineate the strategies we propose for its further empirical investigation. This amendment ensures that our hypothesis is presented with full consideration of its speculative basis, supported by a coherent framework for future validation.

      (2) While F4/80+CD11b+ population is decreased, this is mainly driven by CD11b and F4/80+ alone population is significantly increased. This is counter to the hypothesis.

      Thank you for addressing the apparent discrepancy in our findings concerning the F4/80+CD11b+ population and the increase in the F4/80+ alone population, which seems to contradict our initial hypothesis. Your observation is indeed crucial for the integrity of our study, and we appreciate the opportunity to clarify this matter.

      (1) Clarification of flow cytometry results:

      • In response to the concerns raised, we revisited our flow cytometry experiments with a focus on more clearly distinguishing the cell populations. Our initial graph had some ambiguities in cell grouping, which might have led to misinterpretations.

      • The revised flow cytometry analysis, specifically aimed at identifying red pulp macrophages (RPMs) characterized as F4/80hiCD11blo in the spleen, demonstrated a significant decrease in the F4/80 population. This finding is now in alignment with our immunofluorescence results.

      Author response image 12.

      Author response image 13.

      (2) Revised data and interpretation:

      • The results presented in new Figure 3G and Figure 5 (A and B) consistently indicate a notable reduction in the RPMs population following HH exposure. This supports our revised understanding that HH exposure leads to a decrease in the specific macrophage subset (F4/80hiCD11blo) in the spleen.

      We’ve updated our manuscript to reflect these new findings and interpretations. The revised manuscript details the revised flow cytometry analysis and discusses the potential mechanisms behind the observed changes in macrophage populations.

      (3) HO-1 expression cannot be used as a surrogate to quantify number of macrophages as the expression per cell can decrease and give the same results. In addition, the localization of effect to the red pulp is not equivalent to an assertion that the conclusion applies to macrophages given the heterogeneity of this part of the organ and the spleen in general.

      Thank you for your insightful comments regarding the use of HO-1 expression as a surrogate marker for quantifying macrophage numbers, and for pointing out the complexity of attributing changes in HO-1 expression specifically to macrophages in the splenic red pulp. Your observations are indeed valid and warrant a detailed response.

      (1) Role of HO-1 in macrophage activity:

      • In our study, HO-1 expression was not utilized as a direct marker for quantifying macrophages. Instead, it was considered an indicator of macrophage activity, particularly in relation to erythrophagocytosis. HO-1, being upregulated in response to erythrophagocytosis, serves as an indirect marker of this process within splenic macrophages.

      • The rationale behind this approach was that increased HO-1 expression, induced by erythrophagocytosis in the spleen’s red pulp, could suggest an augmentation in the activity of splenic macrophages involved in this process.

      (2) Limitations of using HO-1 as an indicator:

      • We acknowledge your point that HO-1 expression per cell might decrease, potentially leading to misleading interpretations if used as a direct quantifier of macrophage numbers. The variability in HO-1 expression per cell indeed presents a limitation in using it as a sole indicator of macrophage quantity.

      • Furthermore, your observation about the heterogeneity of the spleen, particularly the red pulp, is crucial. The red pulp is a complex environment with various cell types, and asserting that changes in HO-1 expression are exclusive to macrophages could oversimplify this complexity.

      (3) Addressing the concerns:

      • To address these concerns, we propose to supplement our HO-1 expression data with additional specific markers for macrophages. This would help in correlating HO-1 expression more accurately with macrophage numbers and activity.

      • We also plan to conduct further studies to delineate the specific cell types in the red pulp contributing to HO-1 expression. This could involve techniques such as immunofluorescence or immunohistochemistry, which would allow us to localize HO-1 expression to specific cell populations within the splenic red pulp.

      We’ve revised our manuscript to clarify the role of HO-1 expression as an indirect marker of erythrophagocytosis and to acknowledge its limitations as a surrogate for quantifying macrophage numbers.

      (4) line 63-65 is inaccurate as red cell homeostasis reaches a new steady state in chronic hypoxia.

      Thank you for pointing out the inaccuracy in lines 63-65 of our manuscript regarding red cell homeostasis in chronic hypoxia. Your feedback is invaluable in ensuring the accuracy and scientific integrity of our work. We’ve revised lines 63-65 to accurately reflect the understanding.

      (5) Eryptosis is not defined in the manuscript.

      Thank you for highlighting the omission of a definition for eryptosis in our manuscript. We acknowledge the significance of precisely defining such key terminologies, particularly when they play a crucial role in the context of our research findings. Eryptosis, a term referenced in our study, is a specialized form of programmed cell death unique to erythrocytes. Similar with apoptosis in other cell types, eryptosis is characterized by distinct physiological changes including cell shrinkage, membrane blebbing, and the externalization of phosphatidylserine on the erythrocyte surface. These features are indicative of the RBCs lifecycle and its regulated destruction process.

      However, it is pertinent to note that our current study does not extensively delve into the mechanisms or implications of eryptosis. Our primary focus has been to elucidate the effects of HH exposure on the processes of splenic erythrophagocytosis and the resultant impact on the lifespan of RBCs. Given this focus, and to maintain the coherence and relevance of our manuscript, we have decided to exclude specific discussions of eryptosis from our revised manuscript. This decision aligns with our aim to provide a clear and concentrated exploration of the influence of HH exposure on RBCs dynamics and splenic function.

      We appreciate your input, which has significantly contributed to enhancing the clarity and accuracy of our manuscript. The revision ensures that our research is presented with a focused scope, aligning closely with our experimental investigations and findings.

      (6) Physiologically, there is no evidence that there is any "free iron" in cells, making line 89 point inaccurate.

      Thank you for highlighting the concern regarding the reference to "free iron" in cells in line 89 of our manuscript. The term "free iron" in our manuscript was intended to refer to divalent iron (Fe2+), rather than unbound iron ions freely circulating within cells. We acknowledge that the term "free iron" might lead to misconceptions, as it implies the presence of unchelated iron, which is not physiologically common due to the potential for oxidative damage. To rectify this and provide clarity, we’ve revised line 89 of our manuscript to reflect our meaning more accurately. Instead of "free iron," we use "divalent iron (Fe2+)" to avoid any misunderstanding regarding the state of iron in cells. We also ensure that any implications drawn from the presence of Fe2+ in cells are consistent with current scientific literature and understanding.

      (7) Fig 1f no stats

      We appreciate your critical review and suggestions, which help in improving the accuracy and clarity of our research. We’ve revised statistic diagram of new Figure 1F.

      (8) Splenectomy experiments demonstrate that erythrophagocytosis is almost completely replaced by functional macrophages in other tissues (likely Kupffer cells in the liver). there is only a minor defect and no data on whether it is in fact the liver or other organs that provide this replacement function and makes the assertions in lines 345-349 significantly overstated.

      Thank you for your critical assessment of our interpretation of the splenectomy experiments, especially concerning the role of erythrophagocytosis by macrophages in other tissues, such as Kupffer cells in the liver. We appreciate your observation that our assertions may be overstated and acknowledge the need for more specific data to identify which organs compensate for the loss of splenic erythrophagocytosis.

      (1) Splenectomy experiment findings:

      • Our findings in Figure 2D do indicate that in the splenectomized group under NN conditions, erythrophagocytosis is substantially compensated for by functional macrophages in other tissues. This is an important observation that highlights the body's ability to adapt to the loss of splenic function.

      • However, under HH conditions, our data suggest that the spleen plays an important role in managing erythrocyte turnover, as indicated by the significant impact of splenectomy on erythrophagocytosis and subsequent erythrocyte dynamics.

      (2) Addressing the lack of specific organ identification:

      • We acknowledge that our study does not definitively identify which organs, such as the liver or others, take over the erythrophagocytosis function post-splenectomy. This is an important aspect that needs further investigation.

      • To address this, we also plan to perform additional experiments that could more accurately point out the specific tissues compensating for the loss of splenic erythrophagocytosis. This could involve tracking labeled erythrocytes or using specific markers to identify macrophages actively engaged in erythrophagocytosis in various organs.

      (3) Revising manuscript statements:

      Considering your feedback, we’ve revised the statements in lines 345-349 (lines 378-383 in revised manuscript) to enhance the scientific rigor and clarity of our research presentation.

      (9) M1 vs M2 macrophage experiments are irrelevant to the main thrust of the manuscript, there are no references to support the use of only CD16 and CD86 for these purposes, and no stats are provided. It is also unclear why bone marrow monocyte data is presented and how it is relevant to the rest of the manuscript.

      Thank you for your critical evaluation of the relevance and presentation of the M1 vs. M2 macrophage experiments in our manuscript. We appreciate your insights, especially regarding the use of specific markers and the lack of statistical analysis, as well as the relevance of bone marrow monocyte data to our study's main focus.

      (1) Removal of M1 and M2 macrophage data:

      Based on your feedback and our reassessment, we agree that the results pertaining to M1 and M2 macrophages did not align well with the main objectives of our manuscript. Consequently, we have decided to remove the related content on M1 and M2 macrophages from the revised manuscript. This decision was made to ensure that our manuscript remains focused and coherent, highlighting our primary findings without the distraction of unrelated or insufficiently supported data.

      The use of only CD16 and CD86 markers for M1 and M2 macrophage characterization, without appropriate statistical analysis, was indeed a methodological limitation. We recognize that a more comprehensive set of markers and rigorous statistical analysis would be necessary for a meaningful interpretation of M1/M2 macrophage polarization. Furthermore, the relevance of these experiments to the central theme of our manuscript was not adequately established. Our study primarily focuses on erythrophagocytosis and red pulp macrophage dynamics under hypobaric hypoxia, and the M1/M2 polarization aspect did not contribute significantly to this narrative.

      (2) Clarification on bone marrow monocyte data:

      Regarding the inclusion of bone marrow monocyte data, we acknowledge that its relevance to the main thrust of the manuscript was not clearly articulated. In the revised manuscript, we provide a clearer rationale for its inclusion and how it relates to our primary objectives.

      (3) Commitment to clarity and relevance:

      We are committed to ensuring that every component of our manuscript contributes meaningfully to our overall objectives and research questions. Your feedback has been instrumental in guiding us to streamline our focus and present our findings more effectively.

      We appreciate your valuable feedback, which has led to a more focused and relevant presentation of our research. These changes enhance the clarity and impact of our manuscript, ensuring that it accurately reflects our key research findings.

      (10) Biotinolated RBC clearance is enhanced, demonstrating that RBC erythrophagocytosis is in fact ENHANCED, not diminished, calling into question the founding hypothesis that the manuscript proposes.

      Thank you for your critical evaluation of our data on biotinylated RBC clearance, which suggests enhanced erythrophagocytosis under HH conditions. This observation indeed challenges our founding hypothesis that erythrophagocytosis is diminished in this setting. Below is a summary of our findings and methodology.

      (1) Interpretation of RBC labeling results:

      Both the previous results of NHS-biotin labeled RBCs (new Figure 4, A-D) and the current results of PKH67-labeled RBCs (new Figure 4, E-H) demonstrated a decrease in the number of labeled RBCs with an increase in injection time. The production of RBCs, including bone marrow and spleen production, was significantly increased following HH exposure, resulting in a consistent decrease in the proportion of labeled RBCs via flow cytometry detection both in the blood and spleen of mice compared to the NN group. However, compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under NN exposure, there was a significantly weaker reduction in the amplitude of fluorescently labeled RBCs after HH exposure. Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure.

      Author response image 14.

      (2) Increased RBCs production under HH conditions:

      It's important to note that RBCs production, including from bone marrow and spleen, was significantly increased following HH exposure. This increase in RBCs production could contribute to the decreased proportion of labeled RBCs observed in flow cytometry analyses, as there are more unlabeled RBCs diluting the proportion of labeled cells in the blood and spleen.

      (3) Analysis of erythrophagocytosis in RPMs:

      Our analysis of PKH67-labeled RBCs content within RPMs following HH exposure showed a significant reduction in the number of PKH67-positive RPMs in the spleen (new Figure 5). This finding suggests a decrease in erythrophagocytosis by RPMs under HH conditions.

      Author response image 15.

      (4) Reconciling the findings:

      The apparent contradiction between enhanced RBC clearance (suggested by the reduced proportion of labeled RBCs) and reduced erythrophagocytosis in RPMs (indicated by fewer PKH67-positive RPMs) may be explained by the increased overall production of RBCs under HH. This increased production could mask the actual erythrophagocytosis activity in terms of the proportion of labeled cells. Therefore, while the proportion of labeled RBCs decreases more significantly under HH conditions, this does not necessarily indicate an enhanced erythrophagocytosis rate, but rather an increased dilution effect due to higher RBCs turnover.

      (5) Revised interpretation and manuscript changes:

      Given these factors, we update our manuscript to reflect this detailed interpretation and clarify the implications of the increased RBCs production under HH conditions on our observations of labeled RBCs clearance and erythrophagocytosis. We appreciate your insightful feedback, which has prompted a careful re-examination of our data and interpretations. We hope that these revisions provide a more accurate and comprehensive understanding of the effects of HH on erythrophagocytosis and RBCs dynamics.

      (11) Legend in Fig 4c-4d looks incorrect and Fig 4e-4f is very non-specific since Wright stain does not provide evidence of what type of cells these are and making for a significant overstatement in the contribution of this data to "confirming" increased erythrophagocytosis in the spleen under HH exposure (line 395-396).

      Thank you for your insightful observations regarding the data presentation and figure legends in our manuscript, particularly in relation to Figure 4 (renamed as Figure 6 in the revised manuscript) and the use of Wright-Giemsa composite staining. We appreciate your constructive feedback and acknowledge the importance of presenting our data with utmost clarity and precision.

      (1) Amendments to Figure legends:

      We recognize the necessity of rectifying inaccuracies in the legends of the previously labeled Figure 4C and D. Corrections have been meticulously implemented to ensure the legends accurately contain the data presented. Additionally, we acknowledge the error concerning the description of Wright staining. The method employed in our study is Wright-Giemsa composite staining, which, unlike Wright staining that solely stains cytoplasm (RBC), is capable of staining both nuclei and cytoplasm.

      (2) Addressing the specificity of Wright-Giemsa Composite staining:

      Our approach involved quantifying RBC retention using Wright-Giemsa composite staining on single splenic cells post-perfusion at 7 and 14 days post HH exposure. We understand and appreciate your concerns regarding the nonspecific nature of Wright staining. Although Wright stain is a general hematologic stain and not explicitly specific for certain cell types, its application in our study aimed to provide preliminary insights. The spleen cells, devoid of nuclei and thus likely to be RBCs, were stained and observed post-perfusion, indicating RBC retention within the spleen.

      (3) Incorporating additional methods for RBC identification:

      To enhance the specificity of our findings, we integrated supplementary methods for RBC identification in the revised manuscript. We employed band3 immunostaining (in the new Figure 6, C-D and G-H) and PKH67 labeling (Figure 6, I-J) for a more targeted identification of RBCs. Band3, serving as a reliable marker for RBCs, augments the specificity of our immunostaining approach. Likewise, PKH67 labeling affords a direct and definitive means to assess RBC retention in the spleen following HH exposure.

      Author response image 16. same as 10

      (4) Revised interpretation and manuscript modifications:

      Based on these enhanced methodologies, we have refined our interpretation of the data and accordingly updated the manuscript. The revised narrative underscores that our conclusions regarding reduced erythrophagocytosis and RBC retention under HH conditions are corroborated by not only Wright-Giemsa composite staining but also by band3 immunostaining and PKH67 labeling, each contributing distinctively to our comprehensive understanding.

      We are committed to ensuring that our manuscript precisely reflects the contribution of each method to our findings and conclusions. Your thorough review has been invaluable in identifying and rectifying areas for improvement in our research report and interpretation.

      (12) Ferroptosis data in Fig 5 is not specific to macrophages and Fer-1 data confirms the expected effect of Fer-1 but there is no data that supports that Fer-1 reverses the destruction of these cells or restores their function in hypoxia. Finally, these experiments were performed in peritoneal macrophages which are functionally distinct from splenic RPM.

      Thank you for your critique of our presentation and interpretation of the ferroptosis data in Figure 5 (renamed as Figure 9 in the revised manuscript), as well as your observations regarding the specificity of the experiments to macrophages and the effects of Fer-1. We value your input and acknowledge the need to clarify these aspects in our manuscript.

      (1) Clarification on cell type used in experiments:

      • We appreciate your attention to the details of our experimental setup. The experiments presented in Figure 9 were indeed conducted on splenic macrophages, not peritoneal macrophages, as incorrectly mentioned in the original figure legend. This was an error in our manuscript, and we have revised the figure legend accordingly to accurately reflect the cell type used.

      (2) Specificity of ferroptosis data:

      • We recognize that the data presented in Figure 9 need to be more explicitly linked to the specific macrophage population being studied. In the revised manuscript, we ensure that the discussion around ferroptosis data is clearly situated within the framework of splenic macrophages.

      • We also provide additional methodological details in the 'Methods' section to reinforce the specificity of our experiments to splenic macrophages.

      (3) Effects of Fer-1 on macrophage function and survival:

      • Regarding the effect of Fer-1, we agree that while our data confirms the expected effect of Fer-1 in inhibiting ferroptosis, we have not provided direct evidence that Fer-1 reverses the destruction of macrophages or restores their function in hypoxia.

      • To address this, we propose additional experiments to specifically investigate the impact of Fer-1 on the survival and functional restoration of splenic macrophages under hypoxic conditions. This would involve assessing not only the inhibition of ferroptosis but also the recovery of macrophage functionality post-treatment.

      (4) Revised interpretation and manuscript changes:

      • We’ve revised the relevant sections of our manuscript to reflect these clarifications and proposed additional studies. This includes modifying the discussion of the ferroptosis data to more accurately represent the cell types involved and the limitations of our current findings regarding the effects of Fer-1.

      • The revised manuscript presents a more detailed interpretation of the ferroptosis data, clearly describing what our current experiments demonstrate and what remains to be investigated.

      We are grateful for your insightful feedback, which has highlighted important areas for improvement in our research presentation. We think that these revisions will enhance the clarity and scientific accuracy of our manuscript, ensuring that our findings and conclusions are well-supported and precisely communicated.

      Reviewer #2 (Recommendations For The Authors):

      The following questions and remarks should be considered by the authors:

      (1) The methods should clearly state whether the HH was discontinued during the 7 or 14 day exposure for cleaning, fresh water etc. Moreover, how was CO2 controlled? The procedure for splenectomy needs to be described in the methods.

      Thank you for your inquiry regarding the specifics of our experimental methods, particularly the management of HH exposure and the procedure for splenectomy. We appreciate your attention to detail and the importance of these aspects for the reproducibility and clarity of our research.

      (1) HH exposure conditions:

      In our experiments, mice were continuously exposed to HH for the entire duration of 7 or 14 days, without interruption for activities such as cleaning or providing fresh water. This uninterrupted exposure was crucial for maintaining consistent hypobaric conditions throughout the experiment. The hypobaric chamber was configured to ensure a ventilation rate of 25 air exchanges per minute. This high ventilation rate was effective in regulating the concentration of CO2 inside the chamber, thereby maintaining a stable environment for the mice.

      (2) The splenectomy was performed as follows:

      After anesthesia, the mice were placed in a supine position, and their limbs were fixed. The abdominal operation area was skinned, disinfected, and covered with a sterile towel. A median incision was made in the upper abdomen, followed by laparotomy to locate the spleen. The spleen was then carefully pulled out through the incision. The arterial and venous directions in the splenic pedicle were examined, and two vascular forceps were used to clamp all the tissue in the main cadre of blood vessels below the splenic portal. The splenic pedicle was cut between the forceps to remove the spleen. The end of the proximal hepatic artery was clamped with a vascular clamp, and double or through ligation was performed to secure the site. The abdominal cavity was then cleaned to ensure there was no bleeding at the ligation site, and the incision was closed. Post-operatively, the animals were housed individually. Generally, they were able to feed themselves after recovering from anesthesia and did not require special care.

      We hope this detailed description addresses your queries and provides a clear understanding of the experimental conditions and procedures used in our study. These methodological details are crucial for ensuring the accuracy and reproducibility of our research findings.

      (2) The lack of changes in MCH needs explanation? During stress erythropoiesis some limit in iron availability should cause MCH decrease particularly if the authors claim that macrophages for rapid iron recycling are decreased. Fig 1A is dispensable. Fig 1G NN control 14 days does not make sense since it is higher than 7 days of HH.

      Thank you for your inquiry regarding the lack of changes in Mean Corpuscular Hemoglobin (MCH) in our study, particularly in the context of stress erythropoiesis and decreased macrophage-mediated iron recycling. We appreciate the opportunity to provide further clarification on this aspect.

      (1) Explanation for stable MCH levels:

      • Our research identified a decrease in erythrophagocytosis and iron recycling in the spleen following HH exposure. Despite this, the MCH levels remained stable. This observation can be explained by considering the compensatory roles of other organs, particularly the liver and duodenum, in maintaining iron homeostasis.

      • Specifically, our investigations revealed an enhanced capacity of the liver to engulf RBCs and process iron under HH conditions. This increased hepatic erythrophagocytosis likely compensates for the reduced splenic activity, thereby stabilizing MCH levels.

      (2) Role of hepcidin and DMT1 expression:

      Additionally, hypoxia is known to influence iron metabolism through the downregulation of Hepcidin and upregulation of Divalent Metal Transporter 1 (DMT1) expression. These alterations lead to enhanced intestinal iron absorption and increased blood iron levels, further contributing to the maintenance of MCH levels despite reduced splenic iron recycling.

      (3) Revised Figure 1 and data presentation

      To address the confusion regarding the data presented in Figure 1G, we have made revisions in our manuscript. The original Figure 1G, which did not align with the expected trends, has been removed. In its place, we have included a statistical chart of Figure 1F in the new version of Figure 1G. This revision will provide a clearer and more accurate representation of our findings.

      (4) Manuscript updates and future research:

      • We update our manuscript to incorporate these explanations, ensuring that the rationale behind the stable MCH levels is clearly articulated. This includes a discussion on the role of the liver and duodenum in iron metabolism under hypoxic conditions.

      • Future research could explore in greater detail the mechanisms by which different organs contribute to iron homeostasis under stress conditions like HH, particularly focusing on the dynamic interplay between hepatic and splenic functions.

      We thank you for your insightful question, which has prompted a thorough re-examination of our findings and interpretations. We believe that these clarifications will enhance the overall understanding of our study and its implications in the context of iron metabolism and erythropoiesis under hypoxic conditions.

      (3) Fig 2 the difference between sham and splenectomy is really marginal and not convincing. Is there also a difference at 7 days? Why does the spleen size decrease between 7 and 14 days?

      Thank you for your observations regarding the marginal differences observed between sham and splenectomy groups in Figure 2, as well as your inquiries about spleen size dynamics over time. We appreciate this opportunity to clarify these aspects of our study.

      (1) Splenectomy vs. Sham group differences:

      • In our experiments, the difference between the sham and splenectomy groups under HH conditions, though subtle, was consistent with our hypothesis regarding the spleen's role in erythrophagocytosis and stress erythropoiesis. Under NN conditions, no significant difference was observed between these groups, which aligns with the expectation that the spleen's contribution is more pronounced under hypoxic stress.

      (2) Spleen size dynamics and peak stress erythropoiesis:

      • The observed splenic enlargement prior to 7 days can be attributed to a combination of factors, including the retention of RBCs and extramedullary hematopoiesis, which is known to be a response to hypoxic stress.

      • Prior research has elucidated that splenic stress-induced erythropoiesis, triggered by hypoxic conditions, typically attains its zenith within a timeframe of 3 to 7 days. This observation aligns with our Toluidine Blue (TO) staining results, which indicated that the apex of this response occurs at the 7-day mark (as depicted in Figure 1, F-G). Here, the culmination of this peak is characteristically succeeded by a diminution in extramedullary hematopoiesis, a phenomenon that could elucidate the observed contraction in spleen size, particularly in the interval between 7 and 14 days.

      • This pattern of splenic response under prolonged hypoxic stress is corroborated by studies such as those conducted by Wang et al. (2021), Harada et al. (2015), and Cenariu et al. (2021). These references collectively underscore that the spleen undergoes significant dynamism in reaction to sustained hypoxia. This dynamism is initially manifested as an enlargement of the spleen, attributable to escalated erythropoiesis and erythrophagocytosis. Subsequently, as these processes approach normalization, a regression in spleen size ensues.

      We’ve revised our manuscript to include a more detailed explanation of these splenic dynamics under HH conditions, referencing the relevant literature to provide a comprehensive context for our findings. We will also consider performing additional analysis or providing further data on spleen size changes at 7 days to support our observations and ensure a thorough understanding of the splenic response to hypoxic stress over time.

      (4) Fig 3 B the clusters should be explained in detail. If the decrease in macrophages in Fig 3K/L is responsible for the effect, why does splenectomy not have a much stronger effect? How do the authors know which cells died in the calcein stained population in Fig 3D?

      Thank you for your insightful questions regarding the details of our data presentation in Figure 3, particularly about the identification of cell clusters and the implications of macrophage reduction. We appreciate the opportunity to address these aspects and clarify our findings.

      (1) Explanation of cell clusters in Figure 3B:

      • In the revised manuscript, we have included detailed notes for each cell population represented in Figure 3B (Figure 3D in revised manuscript). These notes provide a clearer understanding of the cell types present in each cluster, enhancing the interpretability of our single-cell sequencing data.

      • This detailed annotation will help readers to better understand the composition of the splenic cell populations under study and how they are affected by hypoxic conditions.

      (2) Impact of splenectomy vs. macrophage reduction:

      • The interplay between the reduction in macrophage populations, as evidenced by our single-cell sequencing data, and the ramifications of splenectomy presents a multifaceted scenario. Notably, the observed decline in macrophage numbers following HH exposure does not straightforwardly equate to a comparable alteration in overall splenic function, as might be anticipated with splenectomy.

      • In the context of splenectomy under HH conditions, a significant escalation in the RBCs count was observed, surpassing that in non-splenectomized mice exposed to HH. This finding underscores the spleen's critical role in modulating RBCs dynamics under HH. It also indirectly suggests that the diminished phagocytic capacity of the spleen following HH exposure contributes to an augmented RBCs count, albeit to a lesser extent than in the splenectomy group. This difference is attributed to the fact that, while the number of RPMs in the spleen post-HH is reduced, they are still present, unlike in the case of splenectomy, where they are entirely absent.

      • Splenectomy entails the complete removal of the spleen, thus eliminating a broad spectrum of functions beyond erythrophagocytosis and iron recycling mediated by macrophages. The nuanced changes observed in our study may be reflective of the spleen's diverse functionalities and the organism's adaptive compensatory mechanisms in response to the loss of this organ.

      (3) Calcein stained population in Figure 3D:

      • Regarding the identification of cell death in the calcein-stained population in Figure 3D (Figure 3A in revised manuscript), we acknowledge that the specific cell types undergoing death could not be distinctly determined from this analysis alone.

      • The calcein staining method allows for the visualization of live (calcein-positive) and dead (calcein-negative) cells, but it does not provide specific information about the cell types. The decrease in macrophage population was inferred from the single-cell sequencing data, which offered a more precise identification of cell types.

      (4) Revised manuscript and data presentation:

      • Considering your feedback, we have revised our manuscript to provide a more comprehensive explanation of the data presented in Figure 3, including the nature of the cell clusters and the interpretation of the calcein staining results.

      • We have also updated the manuscript to reflect the removal of Figure 3K/L results and to provide a more focused discussion on the relevant findings.

      We are grateful for your detailed review, which has helped us to refine our data presentation and interpretation. These clarifications and revisions will enhance the clarity and scientific rigor of our manuscript, ensuring that our conclusions are well-supported and accurately conveyed.

      (5) Is the reduced phagocytic capacity in Fig 4B significant? Erythrophagocytosis is compromised due to the considerable spontaneous loss of labelled erythrocytes; could other assays help? (potentially by a modified Chromium release assay?). Is it necessary to stimulated phagocytosis to see a significant effect?

      Thank you for your inquiry regarding the significance of the reduced phagocytic capacity observed in Figure 4B, and the potential for employing alternative assays to elucidate erythrophagocytosis dynamics under HH conditions.

      (1) Significance of reduced phagocytic capacity:

      The observed reduction in the amplitude of fluorescently labeled RBCs in both the blood and spleen under HH conditions suggests a decrease in erythrophagocytosis. This is indicative of a diminished phagocytic capacity, particularly when contrasted with NN conditions.

      (2) Investigation of erythrophagocytosis dynamics:

      To delve deeper into erythrophagocytosis under HH, we employed Tuftsin to enhance this process. Following the injection of PKH67-labeled RBCs and subsequent HH exposure, we noted a significant decrease in PKH67 fluorescence in the spleen, particularly marked after the administration of Tuftsin. This finding implies that stimulated erythrophagocytosis can influence RBCs lifespan.

      (3) Erythrophagocytosis under normal and hypoxic conditions:

      Under normal conditions, the reduction in phagocytic activity is less apparent without stimulation. However, under HH conditions, our findings demonstrate a clear weakening of the phagocytic effect. While we established that promoting phagocytosis under NN conditions affects RBC lifespan, the impact of enhanced phagocytosis under HH on RBCs numbers was not explicitly investigated.

      (4) Potential for alternative assays:

      Considering the considerable spontaneous loss of labeled erythrocytes, alternative assays such as a modified Chromium release assay could provide further insights. Such assays might offer a more nuanced understanding of erythrophagocytosis efficiency and the stability of labeled RBCs under different conditions.

      (5) Future research directions:

      The implications of these results suggest that future studies should focus on comparing the effects of stimulated phagocytosis under both NN and HH conditions. This would offer a clearer picture of the impact of hypoxia on the phagocytic capacity of macrophages and the subsequent effects on RBC turnover.

      In summary, our findings indicate a diminished erythrophagocytic capacity, with enhanced phagocytosis affecting RBCs lifespan. Further investigation, potentially using alternative assays, would be beneficial to comprehensively understand the dynamics of erythrophagocytosis in different physiological states.

      (6) Can the observed ferroptosis be influenced by bi- and not trivalent iron chelators?

      Thank you for your question regarding the potential influence of bi- and trivalent iron chelators on ferroptosis under hypoxic conditions. We appreciate the opportunity to discuss the implications of our findings in this context.

      (1) Analysis of iron chelators on ferroptosis:

      In our study, we did not specifically analyze the effects of bi- and trivalent iron chelators on ferroptosis under hypoxia. However, our observations with Deferoxamine (DFO), a well-known iron chelator, provide some insights into how iron chelation may influence ferroptosis in splenic macrophages under hypoxic conditions.

      (2) Effect of DFO on oxidative stress markers:

      Our findings showed that under 1% O2, there was an increase in Malondialdehyde (MDA) content, a marker of lipid peroxidation, and a decrease in Glutathione (GSH) content, indicative of oxidative stress. These changes are consistent with the induction of ferroptosis, which is characterized by increased lipid peroxidation and depletion of antioxidants. Treatment with Ferrostatin-1 (Fer-1) and DFO effectively reversed these alterations. This suggests that DFO, like Fer-1, can mitigate ferroptosis in splenic macrophages under hypoxia, primarily by impacting MDA and GSH levels.

      Author response image 17.

      (3) Potential role of iron chelators in ferroptosis:

      The effectiveness of DFO in reducing markers of ferroptosis indicates that iron availability plays a crucial role in the ferroptotic process under hypoxic conditions. It is plausible that both bi- and trivalent iron chelators could influence ferroptosis, given their ability to modulate iron availability within cells. Since ferroptosis is an iron-dependent form of cell death, chelating iron, irrespective of its valence state, could potentially disrupt the process by limiting the iron necessary for the generation of reactive oxygen species and lipid peroxidation.

      (4) Additional research and manuscript updates:

      Our study highlights the need for further research to explore the differential effects of various iron chelators on ferroptosis, particularly under hypoxic conditions. Such studies could provide a more comprehensive understanding of the role of iron in ferroptosis and the potential therapeutic applications of iron chelators. We update our manuscript to include these findings and discuss the potential implications of iron chelation in the context of ferroptosis under hypoxic conditions. This will provide a broader perspective on our research and its significance in understanding the mechanisms of ferroptosis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study provides insights into the IDA peptide with dual functions in development and immunity. The approach used is solid and helps to define the role of IDA in a two-step process, cell separation followed by activation of innate defenses. The main limitation of the study is the lack of direct evidence linking signaling by IDA and its HAE receptors to immunity. As such the work remains descriptive but it will nevertheless be of interest to a wide range of plant cell biologists.

      We thank the reviewers for thoroughly reading our manuscript. We have used their comments and suggestions- to improve the manuscript. Below is a response to the reviewer's comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper titled 'A dual function of the IDA peptide in regulating cell separation and modulating plant immunity at the molecular level' by Olsson Lalun et al., 2023 aims to understand how IDAHAE/HSL2 signalling modulates immunity, a pathway that has previously been implicated in development. This is a timely question to address as conflicting reports exist within the field. IDL6/7 have previously been shown to negatively regulate immune signalling, disease resistance and stress responses in leaf tissue, however IDA has been shown to positively regulate immunity through the shedding of infected tissues. Moreover, recently the related receptor NUT/HSL3 has been shown to positively regulate immune signalling and disease resistance. This work has the potential to bring clarity to this field, however the manuscript requires some additional work to address these questions. This is especially the case as it contracts some previous work with IDL peptides which are perceived by the same receptor complexes.

      Can IDA induce pathogen resistance? Does the infiltration of IDA into leaf tissue enhance or reduce pathogen growth? Previously it has been shown that IDL6 makes plants more susceptible. Is this also true for IDA? Currently cytoplasmic calcium influx and apoplastic ROS as overinterpreted as immune responses - these can also be induced by many developmental cue e.g. CLE40 induced calcium transients. Whilst gene expression is more specific is also true that treatment with synthetic peptides, which are recognised by LRR-RKs, can induce immune gene expression, especially in the short term, even when that is not there in vivo function e.g. doi.org/10.15252/embj.2019103894.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and we plan for such experiments in future work. We have however, modified the discussion to include the possible role of IDA induced Ca2+ and ROS during development. We have recently published a preprint (accepted for publication in JXB) ( (Galindo-Trigo et al., 2023, https://doi.org/10.1101/2023.09.12.557497)) strengthening the link between IDA and defense by identifying WRKY transcription factors that regulate IDA expression through a Y1H assay.

      This paper shows that receptors other than hae/hsl2 are genetically required to induce defense gene expression, it would have been interesting to see what phenotype would be associated with higher order mutants of closely related haesa/haesa-like receptors. Indeed recently HSL1 has been shown to function as a receptor for IDA/IDL peptides. Could the triple mutant suppress all response? Could the different receptors have distinct outputs? For example for FRK1 gene expression the hae hsl2 mutant has an enhanced response. Could defence gene expression be primarily mediated by HSL1 with subfunctionalisation within this clade?

      We agree that it would be interesting to also include HSL1 in our studies. However, the focus of this study has been on HAE and HSL2 and we wanted to explore their role in IDA induced defense responses. Including HSL1 in these studies will require generation of multiple transgenic lines and repeating most of the experiments and are experiments we will consider in a follow up study together with pathogen assays (that would also address the main concern raised in the comment above). We have however, modified the text to include the known function of HSL1 and discuss the possibility of subfunctionalisation of this receptor clade.

      One striking finding of the study is the strong additive interaction between IDA and flg22 treatment on gene expression. Do the authors also see this for co-treatment of different peptides with flg22, or is this unique function of IDA? Is this receptor dependent (HAE/HSL1/HSL2)?

      This is a good question. Since our study focuses on the IDA signaling pathway we preferentially tested if the additive effect observed between flg22 and mIDA was also observed when mIDA was combined with another peptide involved in defense. The endogenous peptide PIP1, has previously been shown to amplify flg22 signaling (Hou et al 2014, doi:10.1371/journal.ppat.1004331 ). In this study it is shown that co-treatment with flg22 and PIP1 gives increased resistance to Pseudomonas PstDC3000 compared to when plants are treated with each peptide separately. In the same study, the authors also show reduced flg22 induce transcriptional activity of two defense related genes WRKY33 and PR in the receptor like kinase7 (rlk7) mutant (the receptor perceiving PIP1) (). To investigate whether PIP1 would give the same additive effect with mIDA as that observed between flg22 and mIDA, we co-treated seedlings with PIP1 and mIDA. We observed no enhanced transcriptional activity of FRK1, MYB51 and PEP3 in tissue from plants treated with both PIP1 and mIDA peptides compared to single exposure. These results are presented in supplementary figure 11. In conclusion we do not think mIDA acts as a general amplifier of all immune elicitors in plants.

      It is interesting how tissue specific calcium responses are in response to IDA and flg22, suggesting the cellular distribution of their cognate receptors. However, one striking observation made by the authors as well, is that the expression of promoter seems to be broader than the calcium response. Indicating that additional factors are required for the observed calcium response. Could diffusion of the peptide be a contributing factor, or are only some cells competent to induce a calcium response?

      It is interesting that the authors look for floral abscission phenotypes in cngc and rbohd/f mutants to conclude for genetic requirement of these in floral abscission. Do the authors have a hypothesis for why they failed to see a phenotype for the rbohd/f mutant as was published previously? Do you think there might be additional players redundantly mediating these processes?

      It is a possibility that diffusion of the peptide plays a role in the observed response. In a biological context we would assume that the local production of the peptides plays an important role in the cellular responses. In our experimental setup, we add the peptide externally and we can therefore assume that the overlaying cells get in contact with the peptide before cells in the inner tissues and this could be affecting the response recorded However, our results show that there is a differences between flg22 and mIDA induced responses even when the application of the peptides is performed in the same manner, indicating that the difference in the response is not primarily due to the diffusion rate of the peptides but is likely due to different factors being present in different cells. To acquire a better picture of the distribution of receptor expression in the root tissue and to investigate in which cells the receptors have an overlapping expression pattern, we have included results in figure 6 showing plant lines co-expressing transcriptional reporters of FLS2 and HAE or HSL2.

      Can you observe callose deposition in the cotyledons of the 35S::HAE line? Are the receptors expressed in native cotyledons? This is the only phenotype tested in the cotyledons.

      We thank the reviewer for this valuable comment. We have now conducted callose deposition assay on the 35S:HAE line. And Indeed, we observe callose depositions when cotyledons from a 35S:HAE line is treated with mIDA. We have included these results in figure 4 and have adjusted the text regarding the callose assay accordingly. In addition, we have analyzed the promoter activity of pHAE in cotelydons and we observe weak promoter activity. These results are included as supplementary figure 1d.

      Are flg22-induced calcium responses affected in hae hsl2?

      The experiment suggested by the reviewer is an important control to ensure that the hae hsl2-Aeq line can respond to a Ca2+ inducing peptide signaling through a different receptor than HAE or HSL2. One would expect to see a Ca2+ response in this line to the flg22 peptide. We performed this experiment and surprisingly we could not detect a flgg22 induced Ca2+ signal in the hae hsl2 mutnt. As it is unlikely that the Ca2+ response triggered by flg22 is dependent on HAE and HSL2 we have to assume that the lack of response is due to a malfunction of the Aeq sensor in this line. As a control to measure the amount of Aeq present in the cells we treat the Aeq seedlings with 2 M CaCl2 and measure the luminescence constantly for 180 seconds (Ranf et al., 2012, DOI10.1093/mp/ssr064). The CaCl2 treatment disrupts the cells and releases the Aeq sensor into the solution where it will react with Ca2+ and release the total possible response in the sample (Lmax) in form of a luminescent peak. When treating the hae hsl2-Aeq line with CaCl2we observe a luminescent peak, indicating the presence of the sensor, however, the response is reduced compared to WT seedlings expressing Aeq. Given the sensitivity of FLS2 to flg22 one would still expect to see a Ca2+ peak in the hae hsl2-Aeq line even if the amount of sensor is reduced. Given that this is not the case, we have to assume that localization or conformation of the sensor is somehow affected in this line or that there is another biological explanation that we cannot explain at the moment.

      We have therefore opted on omitting the results using the hae hsl2 Aeq lines from the manuscript and are in the process of mutating HAE and HSL2 by CRISPR-Cas9 in the Aeq background to verify that the mIDA triggered Ca2+ response is dependent on HAE and HSL2.

      Reviewer #2 (Public Review):

      Lalun and co-authors investigate the signalling outputs triggered by the perception of IDA, a plant peptide regulating organs abscission. The authors observed that IDA perception leads to a transient influx of Ca2+, to the production of reactive oxygen species in the apoplast, and to an increase accumulation of transcripts which are also responsive to an immunogenic epitope of bacterial flagellin, flg22. The authors show that IDA is transcriptionally upregulated in response to several biotic and abiotic stimuli. Finally, based on the similarities in the molecular responses triggered by IDA and elicitors (such as flg22) the authors proposed that IDA has a dual function in modulating abscission and immunity. The manuscript is rather descriptive and provide little information regarding IDA signalling per se. A potential functional link between IDA signalling and immune signalling remains speculative.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      Reviewer #3 (Public Review):

      Previously, it has been shown the essential role of IDA peptide and HAESA receptor families in driving various cell separation processes such as abscission of flowers as a natural developmental process, of leaves as a defense mechanism when plants are under pathogenic attack or at the lateral root emergence and root tip cell sloughing. In this work, Olsson et al. show for the first time the possible role of IDA peptide in triggering plant innate immunity after the cell separation process occurred. Such an event has been previously proposed to take place in order to seal open remaining tissue after cell separation to avoid creating an entry point for opportunistic pathogens.

      The elegant experiments in this work demonstrate that IDA peptide is triggering the defenseassociated marker genes together with immune specific responses including release of ROS and intracellular CA2+. Thus, the work highlights an intriguing direct link between endogenous cell wall remodeling and plant immunity. Moreover, the upregulation of IDA in response to abiotic and especially biotic stimuli are providing a valuable indication for potential involvement of HAE/IDA signalling in other processes than plant development.

      We are pleased that the reviewer finds our findings linking IDA to defense interesting and would like to thank the reviewer for this positive feedback.

      Strengths:

      The various methods and different approaches chosen by the authors consolidates the additional new role for a hormone-peptide such as IDA. The involvement of IDA in triggering of the immunity complex process represents a further step in understanding what happens after cell separation occurs. The Ca2+ and ROS imaging and measurements together with using the haehsl2 and haehsl2 p35S::HAE-YFP genotypes provide a robust quantification of defense responses activation. While Ca2+ and ROS can be detected after applying the IDA treatment after the occurrence of cell separation it is adequately shown that the enzymes responsible for ROS production, RBOHD and RBOHF, are not implicated in the floral abscission.

      Furthermore, IDA production is triggered by biotic and abiotic factors such as flg22, a bacterial elicitor, fungi, mannitol or salt, while the mature IDA is activating the production of FRK1, MYB51 and PEP3, genes known for being part of plant defense process.

      Thank you.

      Weaknesses:

      Even though there is shown a clear involvement of IDA in activating the after-cell separation immune system, the use of p35S:HAE-YFP line represent a weak point in the scientific demonstration. The mentioned line is driving the HAE receptor by a constitutive promoter, capable of loading the plant with HAE protein without discriminating on a specific tissue. Since it is known that IDA family consist of more members distributed in various tissues, it is very difficult to fully differentiate the effects of HAE present ubiquitously.

      We agree on this statement. Nevertheless, it is important to note that the responses we have observed are not detectable in WT plants that do not (over)express the HAE receptors. Suggesting that the ROS and callose deposition are induced by the addition of mIDA peptide and not the potential presence of the endogenous IDL peptides.

      The co-localization of HAE/HSL2 and FLS2 receptors is a valuable point to address since in the present work, the marker lines presented do not get activated in the same cell types of the root tissues which renders the idea of nanodomains co-localization (as hypothetically written in the discussion) rather unlikely.

      Thank you for raising an important aspect of our study. It is true that not all cells in the root which have promoter activity for FLS2 also exhibit promoter activity for either HAE or HSL2. However, we have observed that certain cells in the roots show promoter activity for both receptors. In the revised version of the manuscript, we have included plants expression a transcriptional promoter for both FLS2 and HAE or HSL2 using different fluorescent proteins. We have investigated overlapping promoter activity both at sites of lateral roots, in the tip of the primary root and in the abscission zone. Our results show overlapping expression of the transcriptional reporters in certain cells, indicating that FLS2 and HAE or HSL2 are likely to be found in some of the same cells during plant development. We also observe cells where only one or none of the promoters are active.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Supplementary Figure 3: re-labelling of y axis; 200 than 200,00 for clarity.

      This has been addressed.

      Supplementary Figure 2: It would be good to include the age of the seedlings used to study calcium influx in the legend.

      This has been addressed.

      Supplementary Figure 1: rephrase 'IDA induces ROS production in Arabidopsis'.

      This has been addressed.

      The use of chelating agents to establish the need of calcium from extracellular space is a clear experiment supporting the calcium response phenotype specific to IDA treatment in seedlings. Removing the last asparagine (N) and using it as a peptide that fails to elicit calcium response could simply be because of the peptide is smaller in length or different chemical properties. Therefore, a scrambled sequence would have been a better control.

      We thank the reviewer for the suggestion of using a scrambled peptide as a negative control, however we find it unlikely that mIDA∆N69 could induce any activity based on previous work. Results from crystal structure of mIDA bound to the HAE receptor and ligand-receptor interaction studies (10.7554/eLife.15075 ) show that the last asparagine in the mIDA peptide is essential for detectable binding to the HAE receptor and that a peptide lacking this amino acid does not have any activity. We will however, in future experiments also include a scrambled version of the peptide as an additional control.

      Reviewer #2 (Recommendations For The Authors):

      Please find below specific comments:

      (1) Most of the molecular outputs triggered by IDA can be considered as common molecular marks of plant peptides signalling, they do not represent strong evidences of a potential function of IDA in modulating immunity. For instance, perception of CIF peptides, which control the establishment of the Casparian strips, regulate the production of reactive oxygen species, and the transcription of genes associated with immune responses (Fujita et al., The EMBO Journal 2020). It should also be considered that FRK1, whose function remains unknown, may be involved in both immunity and abscission and that the upregulation of FRK1 upon IDA treatment is not indicative of active modulation of immune signalling by IDA.

      This is a fair point raised by the reviewer and we now address in the manuscript that ROS and Ca2+ are hallmarks of both plant development and defense. The function of FRK1 is not known however, it is unlikely that the upregulation of FRK1 in response to mIDA plays a role in the developmental progression of abscission as it is not temporally regulated during the abscission process, thus making it an unlikely candidate in the regulation of cell separation (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908). We do however agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      (2) It remains unknown whether IDA modulate immunity. For instance, does IDA perception promote resistance to bacteria (bacterial proliferation, disease symptoms)? Is IDA genetically required for plant disease resistance immunity? Is the IDA signalling pathway genetically required for transcriptional changes induced by flg22, such as increase in FRK1 transcripts? In addition, the authors propose that the proposed function of IDA in modulating immune signalling prevents bacterial infection in tissue exposed to stress(es). Does loss of function of IDA or of its corresponding receptors leads to changes in the ability of bacteria to colonise plant root upon stress(es)?

      Please see the comment above regarding pathogen assays.

      (3) Several aspects of the work appear to correspond to preliminary investigation. For instance, the authors analyse loss of function mutant for genes encoding for Ca2+ permeable channels (CNGCs) which are transcriptionally active during the onset of abscission (Sup. Figure 5). None of the single mutants present an abscission defect. These observations provide no information regarding the identity of the channel(s) involved in IDA-induced calcium influx.

      We agree with the reviewer that we have not been able to identify the channels responsible for the IDA-induced calcium influx. Given the redundancy for many of the members of this multigenic family a future approach to identify proteins responsible for the IDA triggered calcium response could be to create multiple KO mutants by CRISPR Cas9.

      (4) Using H2DCF-DA, the authors observed a decrease in ROS accumulation in the abscission zone of rbohd/rbohf double KO line (Sup Figure 5c) but describe in the text that ROS production in this zone does not depend on RBOHD and RBOHF (L220). Please clarify.

      This has now been clarified in the text.

      (5) The authors describe that rbohd/rbohf double KO present a lower petal break-strength, which they describe as an indication of premature cell wall loosening, and that petals of rbohd/rbohf abscised one position earlier than in WT. Yet, the authors postulate that IDA-induced ROS production does not regulate abscission but may regulate additional responses. Instead the data seems to indicate that ROS production by RBOHD and RBOHF regulate the timing of abscission. In addition, it would have been interesting to test whether IDA signalling pathway regulate ROS production in the abscission zone.

      The rbohd and rbohf double mutants show several phenotypes associated to developmental stress, the mild phenotype observed with regards to premature abscission (by one position) could be caused by the phenotype of the double mutant rather than related to ROS production. Indeed, it has been suggested that the lignified brace in the AZ dependent on ROS production by the aforementioned RBOHs in necessary for the correct concentration of cell modifying enzymes (Lee et al., 2018, https://doi.org/10.1016/j.cell.2018.03.060). The precocious abscission in this double mutant clearly shows this not to be the case. We have tried to do a ROS burst assay on AZ tissue/flowers with the mIDA peptide but have not been successful with this approach. A ROS sensor expressed in AZ tissue would be a valuable tool to address whether IDA signalling regulates ROS production in AZs.

      (6) In Sup. Figure5a, it would be of interest to have a direct comparison of the transcript accumulation of the presented CNGCs and RBOHDs with other of these multigenic families.

      The CNGCs and RBOH gene expression profile shown in the figure are the family members expressed during the developmental progress of floral abscission in stamen AZs. Since there is no difference in the temporal expression of the other family members (and most are either not expressed or very weakly expressed in this tissue) it is not possible to do this comparison (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908).

      (7) L251-253, since IDAdeltaN69 cannot be perceived by its receptors, the absence of induction of pIDA::GUS by IDAdeltaN69 compared to flg22 cannot be seen as a sign of specificity in peptideinduced increase in IDA promotor activity.

      We have rephased this in the text

      (8) Please provide quantitative and statistical analysis of the calcium measurement presented in sup figure 3.

      This has been addressed.

      (9) L339-341; This sentence is unclear to me, please rephrase.

      We have rephased this in the text

      Reviewer #3 (Recommendations For The Authors):

      (1) In order to assess the role of CNGCs in abscission process, it would be more interesting to see the effect on the Ca2+ pattern and ROS signaling after application of mIDA on cngc and rbohf rbohd mutants.

      We agree in this statement and the studies on mIDA induced ROS and Ca2+ on these mutants will provide valuable information to the regulation of the response. We are in the process of making the lines needed to be able to perform these experiments. However, since it requires crossing of genetically encoded sensors into each mutant, and generation of higher order mutants this is a long process.

      (2) With regard to the ROS production (Sup Fig. 1), the application of mIDA can trigger ROS in p35S::HAE:YFP lines, but not in the wild-type plant, which is according to the text "most likely due to the absence of HAE expression" in leaves. The experiment on callose deposition is performed in wild-type cotyledons where no callose deposition could be observed after mIDA treatment (Fig. 4a,b). The conclusion from text is that IDA "is not involved in promoting deposition of callose as a long-term defence response". It appears more likely that neither ROS nor callose can be observed in wild-type plants due to the lack of HAE expression. Therefore, the callose experiment should include the p35S::HAE:YFP lines. The experiment as it is does not allow to draw any conclusion on HAE/IDA involvement in callose formation.

      We fully agree with this comment, thank you for pinpointing this out. We have now performed the callose experiment with the 35S:HAE lines. Please see our answer to reviewer #1.

      (3) Between Sup Fig. 3 and Sup Fig. 5 two different systems were used to asses the floral stage. An adjustment of the floral stages would be easier to convey the levels of HAE/HSL2 expression and hence potentially with the onset of cell-wall degradation.

      We now used the same system to assess floral stages throughout the whole manuscript.

      (4) For the Fig. 1 and 2, it will be helpful to mention the genotype used for imaging/quantification of Ca2+.

      This has been addressed.

      (5) Some of the abbreviations are not introduced as full-text at their first time use in the text, such as: mIDA (Line 68), Ef-Tu (line 85), NADPH (line 77).

      The abbreviations have now been introduced.

      (6) In the legend of Fig. 5 (lines 897 and 898)- in the figure description, the box plots are identified as light gray and dark gray, while in the panel a of the figure the box plots are colored in red and blue.

      Thank you for pointing this out, this has now been corrected.

      (7) In figure 1 and 2. the authors write that the number of replicates is 10 (n=10) but data represents a single analysis. Please provide the quantitative ROI analysis, demonstrating that the observed example is representative. This is particularly important since the authors claim very specific changes in pattern of Ca signaling between mIDA and FLG22 treatments (Line 148).

      (8) Figure 4: please use alternative scaling on the Y axis instead of breaks.

      This has now been fixed.

      (9) Figure 5: it is not clear what n=4 refers to when the authors state three independent replicates. In figure 6 they state 4 technical reps and 3 biological reps. Please ensure this is similar across all descriptions.

      We have now ensured the correct information in all descriptions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on Legionella pneumophila effector proteins that target host vesicle trafficking GTPases during infection and more specifically modulate ubiquitination of the host GTPase Rab10. The evidence supporting the claims of the authors is solid, although it remains unclear how modification of the GTPase Rab10 with ubiquitin supports Legionella virulence and the impact of ubiquitination during LCV formation. The work will be of interest to colleagues studying animal pathogens as well as cell biologists in general.

      We greatly appreciate the positive and valuable feedback from the editors and the reviewers. According to their suggestions, we added many new experimental data and implications of our findings in Legionella virulence in terms of the biological process of its replication niche. Please find our point-to-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Kubori and colleagues characterized the manipulation of the host cell GTPase Rab10 by several Legionella effector proteins, specifically members of the SidE and SidC family. They show that Rab10 undergoes both conventional ubiquitination and noncanonical phosphoribose-ubiquitination, and that this posttranslational modification contributes to the retention of Rab10 around Legionella vacuoles.

      Strengths

      Legionella is an emerging pathogen of increasing importance, and dissecting its virulence mechanisms allows us to better prevent and treat infections with this organism. How Legionella and related pathogens exploit the function of host cell vesicle transport GTPases of the Rab family is a topic of great interest to the microbial pathogenesis field. This manuscript investigates the molecular processes underlying Rab10 GTPase manipulation by several Legionella effector proteins, most notably members of the SidE and SidC families. The finding that MavC conjugates ubiquitin to SdcB to regulate its function is novel, and sheds further light into the complex network of ubiquitin-related effectors from Lp. The manuscript is well written, and the experiments were performed carefully and examined meticulously.

      Weaknesses

      Unfortunately, in its current form this manuscript offers only little additional insight into the role of effector-mediated ubiquitination during Lp infection beyond what has already been published. The enzymatic activities of the SidC and SidE family members were already known prior to this study, as was the importance of Rab10 for optimal Lp virulence. Likewise, it had previously been shown that SidE and SidC family members ubiquitinate various host Rab GTPases, like Rab33 and Rab1. The main contribution of this study is to show that Rab10 is also a substrate of the SidE and SidC family of effectors. What remains unclear is if Rab10 is indeed the main biological target of SdcB (not just 'a' target), and how exactly Rab10 modification with ubiquitin benefits Lp infection.

      Reviewer #1 (Recommendations for The Authors):

      Major points of concern

      (1) The authors show that SdcB increases Rab10 levels on LCVs at later times of infection and conclude that this is its main biological role. An alternative explanation may be that Rab10 is not 'the main' target of SdcB but merely 'a' target, which may explain why the effect of SdcB on Rab10 accumulation on LCV is only detectable after several hours of infection. An unbiased omics-based approach to identify the actual host target(s) of SdcB may be needed to confirm that Rab10 modification by SdcB is biologically relevant.

      We totally agree with your comment that SdcB should have multiple targets considering the abundance of ubiquitin observed on the LCVs when SdcB was expressed (Figure 3). However, the effect of SdcB on Rab10 accumulation at the later time point (7 h) (current Figure 4e) was well supported by the new data showing that the SdcB-mediated ubiquitin conjugation to Rab10 was highly detected at this time point (new Figure 4c). We have tried the comprehensive search of interaction partners of the ANK domain of SdcB. This analysis is planned to be included in our on-going study. We therefore decided not to add the data in this manuscript.

      (2) The authors show that Rab10 within cell lysate is ubiquitinated and conclude that ubiquitination of Rab10 is directly responsible for its retention on the LCV. What is the underlying molecular mechanism for this retention? Are GAP proteins prevented from binding and deactivating Rab10. This may be worth testing.

      It would be a fantastic hypothesis that a Rab10GAP is involved in the regulation of Rab10 localization on the LCV. However, as far as we know, GAP proteins against Rab10 have not been identified yet. It should be an important issue to be addressed when a Rab10GAP will be found.

      (3) Related to this, an alternative explanation would be that Rab10 retention is an indirect effect where inactivators of Rab10, such as host cell GAP proteins, are the main target of SidE/C family members and sent for degradation (see point #1). Can the authors show that Rab10 on the LCV is indeed ubiquitinated?

      The possible involvement of a putative Rab10GAP is currently untestable as it is not known. To address whether Rab10 located on the LCV is ubiquitinated nor not, we conducted the critical experiments using active Rab10 (QL) and inactive Rab10 (TN) (new Figure 4a, new Figure 4-figure supplement 1). As revealed for Rab1 (Murata et al., Nature Cell Biol. 2006; Ingmundson et al., Nature 2007), Rab10 is expected to be recruited to the LCV as a GDPbound inactive form and converted to a GTP-bound active form on the LCV. The new results clearly demonstrated that GTP-locked Rab10QL is preferentially ubiquitinated upon infection, strongly supporting the model; Rab10 is ubiquitinated “on the LCV” by the SidE and SidC family ligases.

      (4) Also, on what residue(s) is Rab10 ubiquitinated? Jeng et. al. (Cell Host Microbe, 2019, 26(4): 551-563)) suggested that K102, K136, and K154 of Rab10 are modified during Lp infection. How does substituting those residues affect the residency of Rab10 on LCVs? Addressing these questions may ultimately help to uncover if the growth defect of a sidE gene cluster deletion strain is due to its inability to ubiquitinate and retain Rab10 on the LCV.

      Thank you for the suggestion. We conducted mutagenesis of the three Lys residues of Rab10 and applied the derivative on the ubiquitination analysis (new Figure 1-figure supplement 1). The Lys substitution to Ala residues did not abrogate the ubiquitination upon Lp infection. This result indicates that ubiquitination sites are present in the other residue(s) including the PR-ubiquitination site(s), raising possibility that disruption of sidE genes would be detrimental for intracellular growth of L. pneumophila because of failure of Rab10 retention.

      (5) The authors proposed that "the SidE family primarily contributes towards ubiquitination of Rab10". In this case, what is the significance of SdcB-mediated ubiquitination of Rab10 during Lp infection?

      We found that the major contribution of SdcB is retention of Rab10 until the late stage of infection. This claim was supported by our new data (new Figure 4c) as mentioned above (response to comment #1).

      (6) The contribution of SdcB to ubiquitination of Rab10 relative to SidC and SdcA is unclear. SidC is shown to be unaffected by MavC. In this case, SidC can ubiquitinate Rab10 regardless of the regulatory mechanism of SdcB by MavC. This is not further being examined or discussed in the manuscript.

      The effect of intrinsic MavC is apparent at the later stage (9 h) of infection (Figure 7c) when SdcB gains its activity (see above). We therefore do not think that the contribution of MavC on the SidC/SdcA activities, which are effective in the early stage, would impact on Rab10 localization. However, without specific experiments addressing this issue, possible MavC effects on SidC/SdcA would be beyond the scope in this manuscript.

      (7) When is Rab10 required during Lp infection? The authors showed that Rab10 levels at LCV are rather stable from 1hr to 7hr post infection. If MavC regulates the activity of SdcB, when does this occur?

      While the Rab10 levels on the LCV (~40 %) are stable during 1-7 h post infection (Figure 2b), it reduced to ~20% at 9 h after infection (Figure 7c) (the description was added in lines 304-306). Rab10 seems to be required for optimal LCV biogenesis over the early to late stages, but may not be required at the maturation stage (9 h). We validated the effect of MavC on the Rab10 localization at this time point (Figure 7c). These observations allowed us to build the scheme described in Figure 7d. We revised the illustration in new Figure 7d according to the helpful suggestions from both the reviewers.

      (8) Previous analyses by MS showed that ubiquitination of Rab10 in Lp-infected cells decreases over time (from 1 hpi to 8 hpi - Cell Host Microbe, 2019, 26(4): 551-563). How does this align with the findings made here that Rab10 levels on the LCV and likely its ubiquitination levels increase over time?

      We carefully compared the Rab10 ubiquitination at 1 h and 7 h after infection (new Figure 1figure supplement 1b). This analysis showed that the level of its ubiquitination decreased over time in agreement with the previous report. Nevertheless, Rab10 was still significantly ubiquitinated at 7 h, which we believe to cause the sustained retention of Rab10 on the LCV at this time point. We added the observation in lines 146-148.

      (9) Polyubiquitination of Rab10 was not detected in cells ectopically producing SdcB and SdeA lacking its DUB domain (Figure 7 - figure supplement 2). Does SdcB actually ubiquitinate Rab10 (see also point #5)? Along the same line, it is curious to find that the ubiquitination pattern of Rab10 is not different for LpΔsidC/ΔsdcA compared to LpΔsidC/dsdcA/dsdcB (Figure 1C). The actual contribution of SdcB to ubiquitinating Rab10 compared to SidC/SdcA thus needs to be clarified.

      Thank you for the important point. We currently hypothesize that SidC/SdcA/SdcB-mediated ubiquitin conjugation can occur only in the presence of PR-ubiquitin on Rab10 (either directly on the PR-ubiquitin or on other residue(s) of Rab10). Failure to detect the polyubiquitination in the transfection condition (Figure 7-figure supplement 2) suggests that this specific ubiquitin conjugation can occur in the restricted condition, i.e. only “on the LCV”. We added this description in the discussion section (lines 334-335). No difference between the ΔsidCΔsdcA and ΔsidCΔsdcAΔsdcB strains (Figure 1C, 1h infection) can be explained by the result that SdcB gains activity at the later stages (see above).

      Minor comments In Figure 4b and 7b, the authors show a quantification of "Rab10-positive LCVs/SdcBpositive LCVs". Whys this distinction? It begs the question what the percentile of Rab10positive/SdcB-negative LCVs might be?

      We took this way of quantification as we just wanted to see the effect of SdcB on the Rab10 localization. To distinguish between SdcB-positive and negative LCVs, we would need to rely on the blue color signals of DAPI to visualize internal bacteria, which we thought to be technically difficult in this specific analysis.

      The band of FLAG-tagged SdcB was not detected by immunoblot using anti-FLAG antibody (Figure 5). The authors hypothesized that "disappearance of the SdcB band can be caused by auto-ubiquitination, as SdcB has an ability to catalyze auto-ubiquitination with a diverse repertoire of E2 enzymes. This can be easily confirmed by using MG-132 to inhibit proteasomal degradation of polyubiquitinated substrates.

      We conducted the experiment using MG-132 as suggested and found that proteasomal degradation is not the cause of the disappearance of the band (new Figure 5-figure supplement 2, added description in lines 228-233). SdcB is actually not degraded. Instead, its polyubiquitination causes its apparent loss by distributing the SdcB bands in the gel.

      In Figure 5F, the authors mentioned that "HA-UbAA did not conjugate to SdcB", whereas "shifted band detected by FLAG probing plausibly represents conjugation of cellular intrinsic Ub". The same argument was made in Figure 6B. These claims should be confirmed by immunoblot using anti-Ub antibody.

      Thank you. We added the data using anti-Ub antibody (P4D1) (Figure 6f, new third panel).

      Figure 7A: In cell producing MavC, SdcB is clearly present on LCV. However, in Figure 5A, SdcB was not detected by immunoblot in cells ectopically expressing MavC-C74A. What is the interpretation for these results?

      SdcB was not degraded in the cells, but just its apparent molecular weight shift occurred by polyubiquitination (see above). The detection of SdcB in the IF images (Figure 7a) supported this claim.

      Reviewer #2 (Public Review):

      This manuscript explores the interplay between Legionella Dot/Icm effectors that modulate ubiquitination of the host GTPase Rab10. Rab10 undergoes phosphoribosyl-ubiquitination (PR-Ub) by the SidE family of effectors which is required for its recruitment to the Legionella containing vacuole (LCV). Through a series of elegant experiments using effector gene knockouts, co-transfection studies and careful biochemistry, Kubori et al further demonstrate that:

      (1) The SidC family member SdcB contributes to the polyubiquitination (poly-Ub) of Rab10 and its retention at the LCV membrane.

      (2) The transglutaminase effector, MavC acts as an inhibitor of SdcB by crosslinking ubiquitin at Gln41 to lysine residues in SdcB.

      Some further comments and questions are provided below.

      (1) From the data in Figure 1, it appears that the PR-Ub of Rab10 precedes and in fact is a prerequisite for poly-Ub of Rab10. The authors imply this but there's no explicit statement but isn't this the case?

      Yes, we think that it is the case. We revised the description in the text accordingly (lines 326327).

      (2) The complex interplay of Legionella effectors and their meta-effectors targeting a single host protein (as shown previously for Rab1) suggests the timing and duration of Rab10 activity on the LCV is tightly regulated. How does the association of Rab10 with the LCV early during infection and then its loss from the LCV at later time points impact LCV biogenesis or stability? This could be clearer in the manuscript and the summary figure does not illustrate this aspect.

      Thank you for pointing the important issue. Association of Rab10 with the LCV is thought to be beneficial for L. pneumophila as it is the identified factor which supports bacterial growth in cells (Jeng et al., 2019). We speculate that its loss from the LCV at the later stage of infection would also be beneficial, since the LCV may need to move on to the maturation stage in which a different membrane-fusion process may proceed. As this is too speculative, we gave a simple modification on the part of discussion section (lines 356-358). We also modified the summary figure (revised Figure 7d) as illustrated with the time course.

      (3) How do the activities of the SidE and SidC effectors influence the amount of active Rab10 on the LCV (not just its localisation and ubiquitination)

      We agree that it is an important point. We tested the active Rab10 (QL) and inactive Rab10 (TN) for their ubiquitination and LCV-localization profiles (new Figure 4ab, new Figure 4figure supplement 1 and 2). These analyses led us to the unexpected finding that the active form of Rab10 is the preferential target of the effector-mediated manipulation. See also our response to Reviewer 1’s comment #3. Thank you very much for your insightful suggestion.

      (4) What is the fate of PR-Ub and then poly-Ub Rab10? How does poly-Ub of Rab10 result in its persistence at the LCV membrane rather than its degradation by the proteosome?

      We have not revealed the molecular mechanism in this study. We believe that it is an important question to be solved in future. We added the sentence in the discussion section (lines 376378).

      (5) Mutation of Lys518, the amino acid in SdcB identified by mass spec as modified by MavC, did not abrogate SdcB Ub-crosslinking, which leaves open the question of how MavC does inhibit SdcB. Is there any evidence of MavC mediated modification to the active site of SdcB?

      The active site of SdcB (C57) is required for the modification (Figure 5b), but it is not likely to be the target residue, as the MavC transglutaminase activity restricts the target residues to Lys. It would be expected that multiple Lys residues on SdcB can be modified by MavC to disturb the catalytic activity.

      (6) I found it difficult to understand the role of the ubiquitin glycine residues and the transglutaminase activity of MavC on the inhibition of SdcB function. Is structural modelling using Alphafold for example helpful to explain this?

      We conducted the Alphafold analysis of SdcB-Ub. Unfortunately, when the Glycine residues of Ub was placed to the catalytic pocket of SdcB, Q41 of Ub did not fit to the expected position of SdcB (K518). Probably, the ternary complex (MavC-Ub-SdcB) would cause the change of their entire conformation. A crystal structure analysis or more detailed molecular modeling would be required to resolve the issue.

      (7) Are the lys mutants of SdbB still active in poly-Ub of Rab10?

      We performed the experiment and found that K518R K891R mutant of SdcB still has the E3 ligase activity of similar level with the wild-type upon infection (new Figure 6-figure supplement 2) (lines 283-284). The level was actually slightly higher than that of the wildtype. This result may suggest that the blocking of the modification sites can rescue SdcB from MavC-mediated down regulation.

      Reviewer #2 (Recommendations For The Authors):

      see above

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study applies voltage clamp fluorometry to provide new information about the function of serotonin-gated ion channels 5-HT3AR. The authors convincingly investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists, helping to annotate existing cryo-EM structures. This work confirms that the activation of 5-HT3 receptors is similar to other members of this well-studied receptor superfamily. The work will be of interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Public Reviews:

      All reviewers agreed that these results are solid and interesting. However, reviewers also raised several concerns about the interpretation of the data and some other aspects related to data analysis and discussion that should be addressed by the authors. Essential revisions should include:

      (1) Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      (2) Add quantification of VCF data (e.g. sensor current kinetics, as suggested by reviewer #2) or better clarify/discuss the VCF quantitative aspects that are taken into account to reach some conclusions (reviewer #3).

      (3) Review and add relevant foundational work relevant to this study that is not adequately cited.

      (4) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews below.

      We have revised the text to address all four points. See the answers to referees’ recommendations.

      Reviewer #1 (Public Review):

      Summary:

      This study brings new information about the function of serotonin-gated ion channels 5-HT3AR, by describing the conformational changes undergoing during ligands binding. These results can be potentially extrapolated to other members of the Cys-loop ligand-gated ion channels. By combining fluorescence microscopy with electrophysiological recordings, the authors investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists. The results are convincing and correlate well with the observations from cryo-EM structures. The work will be of important significance and broad interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Strengths:

      The authors present an elegant and well-designed study to investigate the conformational changes on 5-HT3AR where they combine electrophysiological and fluorometry recordings. They determined four positions suitable to act as sensors for the conformational changes of the receptor: two inside and two outside the agonist binding site. They make a strong point showing how antagonists produce conformational changes inside the orthosteric site similarly as agonists do but they failed to spread to the lower part of the ECD, in agreement with previous studies and Cryo-EM structures. They also show how some loss-of-function mutant receptors elicit conformational changes (changes in fluorescence) after partial agonist binding but failed to produce measurable ionic currents, pointing to intermediate states that are stabilized in these conditions. The four fluorescence sensors developed in this study may be good tools for further studies on characterizing drugs targeting the 5-HT3R.

      Weaknesses:

      Although the major conclusions of the manuscript seem well justified, some of the comparison with the structural data may be vague. The claim that monitoring these silent conformational changes can offer insights into the allosteric mechanisms contributing to signal transduction is not unique to this study and has been previously demonstrated by using similar techniques with other ion channels.

      The referee emphasizes that “some of the comparison with the structural data may be vague”. To better illustrate the structural reorganizations seen in the cryo-EM structures and that are used for VCF data interpretation, we added a new supplementary figure 3. It shows a superimposition of Apo, setron and 5-HT bond structures, with reorganization of loop C and Cys-loop consistent with VCF data.

      Reviewer #2 (Public Review):

      Summary:

      This study focuses on the 5-HT3 serotonin receptor, a pentameric ligand-gated ion channel important in chemical neurotransmission. There are many cryo-EM structures of this receptor with diverse ligands bound, however assignment of functional states to the structures remains incomplete. The team applies voltage-clamp fluorometry to measure, at once, both changes in ion channel activity, and changes in fluorescence. Four cysteine mutants were selected for fluorophore labeling, two near the neurotransmitter site, one in the ECD vestibule, and one at the ECD-TMD junction. Agonists, partial agonists, and antagonists were all found to yield similar changes in fluorescence, a proxy for conformational change, near the neurotransmitter site. The strength of the agonist correlated to a degree with propagation of this fluorescence change beyond the local site of neurotransmitter binding. Antagonists failed to elicit a change in fluorescence in the vestibular the ECD-TMD junction sites. The VCF results further turned up evidence supporting intermediate (likely pre-active) states.

      Strengths:

      The experiments appear rigorous, the problem the team tackles is timely and important, the writing and the figures are for the most part very clear. We sorely need approaches orthogonal to structural biology to annotate conformational states and observe conformational transitions in real membranes- this approach, and this study, get right to the heart of what is missing.

      Weaknesses:

      The weaknesses in the study itself are overall minor, I only suggest improvements geared toward clarity. What we are still missing is application of an approach like this to annotate the conformation of the part of the receptor buried in the membrane; there is important debate about which structure represents which state, and that is not addressed in the current study.

      Reviewer #3 (Public Review):

      Summary:

      The authors have examined the 5-HT3 receptor using voltage clamp fluorometry, which enables them to detect structural changes at the same time as the state of receptor activation. These are ensemble measurements, but they enable a picture of the action of different agonists and antagonists to be built up.

      Strengths:

      The combination of rigorously tested fluorescence reporters with oocyte electrophysiology is a solid development for this receptor class.

      Weaknesses:

      The interpretation of the data is solid but relevant foundational work is ignored. Although the data represent a new way of examining the 5-HT3 receptor, nothing that is found is original in the context of the superfamily. Quantitative information is discussed but not presented.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions that may help to improve the manuscript: - Page 6, point 2), typo: "L131W is positioned more profound in each ECD, its side chain (...)"

      “profound” have been corrected into “profoundly”

      • Fig 1C: Why not compare 5-HT responses for the four sensors studied? If the reason is the low currents elicited by 5-HT on I160C/Y207W sensor, could you comment on this effect that is not observed for the other full agonist tested (mCPBG)?

      The point of this figure (Fig 1G) is to show currents that desensitize to follow the evolution of the fluorescence signal during desensitization, that’s why for the I160C/Y207W sensor where 5-HT become a partial agonist we have judge more appropriate to use mCPBG acting as a more potent agonist to elicit currents with clear desensitization component. We have added a sentence in the legend of the figure to explain this choice more clearly.

      • Page 9, paragraph 2: "However, concentration-response curves on V106C/L131W show a small yet visible decorrelation of fluorescence and current (...)" Statistical analysis on EC50c and EC50f will help to see this decorrelation.

      Statistical analysis (unpaired t test) has been added to figure 3 panel A.

      • Page 10, paragraph 1: the authors describe how "different antagonists promote different degrees of local conformational changes". Does it have any relation to the efficacy or potency of these antagonists? Is there any interpretation for this result?

      Since setrons are competitive antagonists, the concept of efficacy of these molecules is unclear. Concerning potency, no correlation between affinity and fluorescence variation is observed. For instance, ondansetron and alosetron bind with similar nanomolar affinity to the 5-HT3R (Thompson & Lummis Curr Pharm Des. 2006;12(28):3615-30) but elicit different fluorescence variations on both S204C and I160C/Y207W sensors.

      • Fig. 1 panel A, graph to far right: axis label is cut ("current (uA)/..."). Colors of graph A - right are not clearly distinguishable e.g. cyan from green.

      The fluorescent green color that describes the mutant has been changed into limon color which is more clearly distinguishable from cyan.

      • Why is R219C/F142W not selected in the study? Are the signals comparable to the chosen R219C/F142W?

      We have chosen not to select R219C/F142W because the current elicited by this construct was lower than the current elicited by the construct R219C/Y140W. Moreover, the residue F142 belongs to the FPF motif from Cys-loop that is essential for gating (Polovinkin et al, 2018, Nature).

      • Fig. 1 legend typo: "mutated in tryptophan”

      “in” has been changed by “into”

      • Fig. 2: yellow color (graphs in panel B) is very hard to read.

      Yellow color has been darkened to yellow/brown to allow easy reading.

      • Fig. 4 is too descriptive and undermines the information of the study. It could be improved e.g. by representing specific structures or partial structures involved. As an additional minor comment, some colors in the figure are hard to differentiate, e.g. magenta and purple.

      We have added relevant specific structures involved, namely loop C, the Cys-loop and pre-M1 loop to clarify. The intensity of magenta and purple has been increased to help differentiate the two sensor positions.

      • Fig S1C: it is confusing to see the same color pattern for the single mutants without the W. I would recommend to label each trace to make it clearer.

      Labelling of the traces corresponding to the single mutants has been added.

      • Fig S2: Indicating the statistical significance in the graph for the mutants with different desensitization properties compared to the WT receptor will help its interpretation.

      The statistical significance of the difference in the desensitization properties has been added to Figure S2.

      Reviewer #2 (Recommendations For The Authors):

      Overall comments for the authors:

      Selection of cysteine mutants and engineered Trp sites is clear and logical. VCF approach with controls for comparing the functionality of WT vs. mutants, and labeled with unlabeled receptor, is well explained and satisfying. The finding that desensitization involves little change in ECD conformation makes sense. It is somewhat surprising, at least superficially, to find that competitive antagonists promote changes in fluorescence in the same 'direction' and amplitude as strong agonists, however, this is indeed consistent with the structural biology, and with findings from other groups testing different labeling sites. Importantly, the team finds that antagonist-binding changes in deltaF do not spread beyond the region near the neurotransmitter site. The finding that most labeling sites in the ECD, in particular those not in/near the neurotransmitter site, fail to report measurable fluorescence changes, is noteworthy. It contrasts with findings in GlyR, as noted by the authors, and supports a mechanism where most of each subunit's ECD behaves as a rigid body.

      Specific questions/comments:

      I am confused about the sensor current kinetics. Results section 2) states that all sensors share the same current desensitization kinetics, while Results section 5) states that the ECD-TMD site and the vestibule site sensors exhibit faster desensitization. SF1C, right-most panel of R219C suggests the mutation and/or labeling here dramatically changes apparent activation and deactivation rates measured by TEVC. Both activation and deactivation upon washout appear faster in this one example. Data for desensitization are not shown here but are shown in aggregate in earlier panels. It is a bit surprising that activation and deactivation would both change but no effect on desensitization. Indeed, it looks like, in Fig. 1G, that desensitization rate is not consistent across all constructs. Can you please confirm/clarify?

      TEVC and VCF recordings in this study show a significant variability concerning both the apparent desensitization and desactivation kinetics. This is illustrated concerning desensitization in TEVC experiments in figure S2, where the remaining currents after 45 secondes of 5-HT perfusion and the rate constants of desensitization are measured on different oocytes from different batches. Therefore, the differences in desensitization kinetics shown in fig 1.G are not significant, the aim of the figure being solely to illustrate that no variation of fluorescence is observed during the desensitization phase. A sentence in the legend of fig 1.G has been added to precise this point. We also revised the first paragraph of result section 5, clearly stating that the slight tendency of faster desensitization of V106C/L131W and R219C/Y140W sensors is not significant.

      An alternative to the conclusion-like title of Results section 2) is that the ECD (and its labels) does not undergo notable conformational changes between activated and desensitized states.

      This is a good point and we have added a sentence at the end of results section 2 to present this idea.

      I find the discussion paragraph on partial agonist mechanisms, starting with "However," to be particularly important but at times hard to follow. Please try to revise for clarity. I am particularly excited to understand how we can understand/improve assignments of cryo-EM structures using the VCF (or other) approaches. As examples of where I struggled, near the top of p. 11, related to the partial agonist discussion, there is an assumption about the pore being either activated, or resting. Is it not also possible that partial agonists could stabilize a desensitized state of the pore? Strictly speaking, the labeling sites and current measurements do not distinguish between pre-active resting and desensitized channel conformations/states. However, the cryo-EM structures can likely help fill in the missing information there- with all the normal caveats. Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      We have revised the section, and hope it is clearer now. We notably state more explicitly the argument for annotation of partial agonist bound closed structures as pre-active, mainly from kinetic consideration of VCF experiments. We also mention and cite a paper by the Chakrapani group published the 4th of January 2024 (Felt et al, Nature Communication), where they present the structures of the m5HT3AR bound to partial agonists, with a set of conformations fully consistent with our VCF data.

      This statement likely needs references: "...indirect experiments of substituted cysteine accessibility method (SCAM) and VCF experiments suggested that desensitization involves weak reorganizations of the upper part of the channel that holds the activation gate, arguing for the former hypothesis."

      Reference Polovinkin et al, Nature, 2018, has been added.

      I respectfully suggest toning down this language a little bit: "VCF allowed to characterize at an unprecedented resolution the mechanisms of action of allosteric effectors and allosteric mutations, to identify new intermediate conformations and to propose a structure-based functional annotation of known high-resolution structures." This VCF stands strongly without unclear claims about unprecedented resolution. What impresses me most are the findings distinguishing how agonists/partial agonists/antagonists share a conserved action in one area and not in another, the observations consistent with intermediate states, and the efforts to integrate these simultaneous current and conformation measurements with the intimidating array of EM structures.

      We thank the referee for his positive comments. We have removed “unprecedented resolution” and revised the sentences.

      It is beyond the scope of the current study, but I am curious what the authors think the hurdles will be to tracking conformation of the pore domain- an area where non-cryo-EM based conformational measurements are sorely needed to help annotate the EM structures.

      We fully agree with the referee that structures of the TMD are very divergent between the various conditions depending on the membrane surrogate. We are at the moment working on this region by VCF, incorporating the fluorescent unnatural amino acid ANAP.

      Minor:

      (1) P. 5, m5-HT3R: Please clarify that this refers to the mouse receptor, if that is correct.

      OK, “mouse” has been added.

      (2) Fig. 1D, I suggest moving the 180-degree arrow to the right so it is below but between the two exterior and vestibular views.

      Ok, it has been done.

      (3) Please add a standard 2D chemical structure of MTS-TAMRA, and TAMRA attached to a cysteine, to Fig 1.

      A standard chemical structure has been added for the two isomers of MTS-TAMRA.

      (4) Please label subpanels in Fig. 1G with the identity of the label site.

      The subpanels have been labelled.

      Reviewer #3 (Recommendations For The Authors):

      This is solid work but I mainly have suggestions about placing it in context.

      (1) Abstract "Data show that strong agonists promote a concerted motion of all sensors during activation, "

      The concept of sensors here is the fluorescent labels? I did not find this meaningful until I read the significance statement.

      We have specified “fluorescently-labelled” before sensors in the abstract.

      (2) p4 "each subunit in the 5-HT3A pentamer...." this description would be identical for any pentameric LGIC so the authors should beware of a misleading specificity. This goes for other phrases in this paragraph. However, the summary of the 5HT specific results is very good.

      About the description of the structure, we added “The 5-HT3AR displays a typical pLGIC structure, where….”.

      (3) This paper is very nicely put together and generally explains itself well. The work is rigorous and comprehensive. But the meaning of quenching (by local Trp) seems straightforward, but it is not made explicit in the paper. Why doesn't simple labelling (single Cys) at this site work? And can we have a more direct demonstration of the advantage of including the Trp (not in the supplementary figure?) All this information is condensed into the first part of figure 1 (the graph in Figure 1A). Figure 1 could be split and the principle of the introduced quenching could be more clearly shown

      detailed in a few more sentences the principle of the TrIQ approach. In addition, to be more explicit, the significative differences of fluorescence comparing sensors with and without tryptophan have been added in Figure 1, panel screening and a sentence have been added in the legend of this figure.

      (4) p10 "VCF measurements are also remarkably coherent with the atomic structures showing an open pore (so called F, State 2 and 5-HT asymmetric states), "

      This statement is intriguing. What do these names or concepts represent? Are they all the same thing? Where do the names come from? What is meant here? Three different concepts, all consistent? Or three names for the same concept?

      We have tried to clarify the statement by making reference to the PDB of the structures.

      (5) "Fluorescence and VCF studies identified similar intermediate conformations for nAChRs, ⍺1-GlyRs and the bacterial homolog GLIC(21,32-35). "

      Whilst this is true, the motivation for such ideas came from earlier work identifying intermediates from electrophysiology alone (such as the flip state (Burzomato et al 2004), the priming state (Mukhatsimova 2009) and the conformational wave in ACh channels grosman et al 2000). It would be appropriate to mention some of this earlier work.

      We have incorporated and described these references in the discussion. Of note, we fully quoted these references in our previous papers on the subject (Menny 2017, Lefebvre 2021, Shi 2023), but the referee is right in asking to quote them again.

      (6) "A key finding of the study is the identification of pre-active intermediates that are favored upon binding of partial agonists and/or in the presence of loss-of-function mutations. "

      Even more fundamental, the idea of a two-state equilibrium for neurotransmitter receptors was discarded in 1957 according to the action of partial agonists.

      DEL CASTILLO J, KATZ B (1957) Interaction at end-plate receptors between different choline derivatives. Proc R Soc Lond B Biol Sci

      So to discover this "intermediate" - that is, bound but minimal activity - in the present context seems a bit much. It is a big positive of this paper that the results are congruent with our expectations, but I cannot see value in posing the results as an extension of the 2-state equilibrium (for which there are anyway other objections).

      As for intermediates being favoured by loss of function mutations, this concept is already well established in glycine receptors (Plested et al 2007, Lape et al 2012) and doubtless in other cases too.

      I do get the point that the authors want to establish a basis in 5-HT3 receptors, but these previous works suggest the results are somewhat expected. This should be commented on.

      We also agree. We replace “key finding” by “key observation”, quote most of the references proposed, and explicitly conclude that “The present work thus extends this idea to the 5HT3AR, together with providing structural blueprints for cryo-EM structure annotation”.

      (7) "In addition, VCF data allow a quantitative estimate of the complex allosteric action of partial agonists, that do not exclusively stabilize the active state and document the detailed phenotypes of various allosteric mutations."

      Where is this provided? If the authors are not motivated to do this, I have some doubts that others will step in. If it is not worth doing, it's probably not worth mentioning either.

      Language has been toned down by “In addition, VCF data give insights in the action of partial agonists, that do not exclusively stabilize the active state and document the phenotypes of various allosteric mutations."

      (8) Figure 1G please mark which construct is which.

      This has been added into Figure 1G

    1. Author Response

      Provisional response

      We would like to thank the reviewers for taking the time to review our manuscript, for providing useful suggestions for improvement, and for highlighting the significance of our approach.

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      Strengths:

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

      We thank the reviewer for recognizing the strengths of our overall approach and our dissection of the functional consequences of the W82R variant of GPA1.

      Weaknesses:

      While the experimental validation of the role of GPA1 variation is well done, the novel cell cycle occupancy QTL aspect of the study is somewhat underexploited. The cell occupancy QTLs that are mentioned all involve loci that the authors have identified in prior studies that involved the same yeast crosses used here. It would be interesting to know what new insights, besides the "usual suspects", the analysis reveals. For example, in Cross B there is another large effect cell occupancy QTL on Chr XI that affects the G1/S stage. What candidate genes and alleles are at this locus?

      We thank the reviewer for this suggestion. We plan to expand the section on cell cycle occupancy QTL in our revision.

      And since cell cycle stages are not biologically independent (a delay in G1, could have a knock-on effect on the frequency of cells with that genotype in G1/S), it would seem important to consider the set of QTLs in concert.

      We thank the reviewer for this suggested clarification. In our revision, we will clarify that the cell cycle occupancy phenotype represents the proportion of cells assigned to a given stage. As the reviewer correctly notes, a change in the proportion of cells in one stage may alter the proportion of cells in other stages, and this could result in cell cycle occupancy QTL for multiple stages. We will make efforts to consider the cell cycle occupancy QTLs in concert in the revised manuscript.

      Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

      We thank the reviewer for recognizing the significant contributions our work makes to the field.

      393 segregant experiment:

      For the experiment with the 393 previously genotyped segregants, did the authors examine whether averaging the expression by genotype for single cells gave expression profiles similar to the bulk RNA-Seq data generated from those genotypes? Also, is it possible (and maybe not, due to the asynchronous nature of the cell culture) to use the expression data to aid in genotyping for those cells whose genotypes are ambiguous? I presume it might be if one has a sufficient number of cells for each genotype, though, for the subsequent one-pot experiments, this is a moot point.

      We thank the reviewer for this comment. While we could expand the analysis along these lines, this is not relevant for the subsequent one-pot eQTL experiments, as the reviewer notes, and is therefore beyond the scope of the manuscript. We will make the data available so that anyone interested can try these analyses.

      Figure 1B:

      Is UMAP necessary to observe an ellipse/circle - I wouldn't be surprised if a simple PCA would have sufficed, and given the current discussion about whether UMAP is ever appropriate for interpreting scRNA-Seq (or ancestry) data, it seems the PCA would be a preferable approach. I would expect that the periodic elements are contained in 2 of the first 3 principal components. Also, it would be nice if there were a supplementary figure similar to Figure 4 of Macosko et al (PMID 26000488) to indeed show the cell cycle dependent expression.

      We thank the reviewer for this comment. We too have been following the debate on the utility of UMAP for scRNA-seq, and in our revision we will provide an alternative visualization of the cell cycle. We will also generate a supplementary figure similar to Figure 4 of Macosko et al. to visualize cell-cycle-dependent gene expression.

      Aging, growth rate, and bet-hedging:

      The mention of bet-hedging reminded me of Levy et al (PMID 22589700), where they saw that Tsl1 expression changed as cells aged and that this impacted a cell's ability to survive heat stress. This bet-hedging strategy meant that the older, slower-growing cells were more likely to survive, so I wondered a couple of things. It is possible from single-cell data to identify either an aging, or a growth rate signature? A number of papers from David Botstein's group culminated in a paper that showed that they could use a gene expression signature to predict instantaneous growth rate (PMID 19119411) and I wondered if a) this is possible from single-cell data, and b) whether in the slower growing cells, they see markers of aging, whether these two signatures might impact the ability to detect eQTLs, and if they are detected, whether they could in some way be accounted for to improve detection.

      We thank the reviewer for this comment and suggested analyses. We are not sure whether one can see gene expression signatures of aging in yeast scRNA-seq data. We believe that such analyses are beyond the scope of this work, but we will make the data available so that anyone interested can try them.

      AIL vs. F2 segregants:

      I'm curious if the authors have given thought to the trade-offs of developing advanced intercross lines for scRNA-Seq eQTL analysis. My impression is that AIL provides better mapping resolution, but at the expense of having to generate the lines. It might be useful to see some discussion on that.

      We thank the reviewer for their comment. We will include some discussion of the trade-offs of different experimental designs in our revision.

      10x vs SPLit-Seq

      10x is a well established, but fairly expensive approach for scRNA-Seq - I wondered how the cost of the 10x approach compares to the previously used approach of genotyping segregants and performing bulk RNA-Seq, and how those costs would change if one used SPLiT-Seq (see PMID 38282330).

      We will provide some ballpark estimates of the costs, and we will discuss the trade-offs of different scRNA-seq technologies in our revision

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their insightful comments and recommendations. We have extensively revised the manuscript in response to the valuable feedback. We believe the results is a more rigorous and thoughtful analysis of the data. Furthermore, our interpretation and discussion of the findings is more focused and highlights the importance of the circuit and its role in the response to stress. Thank you for helping to improve the presented science.

      Key changes made in response to the reviewers comments include:

      • Revision of statistical analyses for nearly all figures, with the addition of a new table of summary statistics to include F and/or t values alongside p-values.

      • Addition of statistical analyses for all fiber photometry data.

      • Examination of data for possible sex dependent effects.

      • Clarification of breeding strategies and genotype differences, with added details to methods to improve clarity.

      • Addressing concerns about the specificity of virus injections and the spread, with additional details added to methods.

      • Modification of terminology related to goal-directed behavior based on reviewer feedback, including removal of the term from the manuscript.

      • Clarification and additional data on the use of photostimulation and its effects, including efforts to inactivate neurons for further insight, despite technical challenges.

      • Correction of grammatical errors throughout the manuscript.

      Reviewer 1:

      Despite the manuscript being generally well-written and easy to follow, there are several grammatical errors throughout that need to be addressed.

      Thank you for highlighting this issue. Grammatical errors have been fixed in the revised version of the manuscript.

      Only p values are given in the text to support statistical differences. This is not sufficient. F and/or t values should be given as well.

      In response to this critique and similar comments from Reviewer 2, we re-evaluated our approach to statistical analyses and extensively revised analyses for nearly all figures. We also added a new table of summary statistics (Supplemental Table 1) containing the type of analysis, statistic, comparison, multiple comparisons, and p value(s). For Figures 4C-E, 5C, 6C-E, 7H-I, and 8H we analyzed these data using two-way repeated measures (RM) ANOVA that examined the main effect of time (either number of sessions or stimulation period) in the same animal and compared that to the main effect of genotype of the animal (Cre+ vs Cre-), and if there was an interaction. For Supplemental Figure 7A we also conducted a two-way RM ANOVA with time as a factor and activity state (number of port activations in active vs inactive nose port) as the other in Cre+ mice. For Figures 5D-E we conducted a two-way mixed model ANOVA that accounted and corrected for missing data. In figures that only compared two groups of data (Figures 5F-L, 6F, 8C-D, 8I, and Supp 6F-G) we used two-tailed t-test for the analysis. If our question and/or hypothesis required us to conduct multiple comparisons between or within treatments, we conducted Bonferroni’s multiple comparisons test for post hoc analysis (we note which groups we compared in Supplemental Table 1). For figures that did or did not show a change in calcium activity (Figure 3G, 3I-K, 7B, 7D-E, 8E-F), we compared waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). The time windows we used as comparison are noted in Supplemental Table 1, and if the comparisons were significant at 95%, 99%, and 99.9% thresholds.

      None of prior comparisons in prior analyses that were significant were found to have fallen below thresh holds for significance. Of those found to be not significantly different, only one change was noted. In Figure 6E there was now a significant baseline difference between Cre+ and Cre- mice with Cre- mice taking longer to first engage the port compared to Cre+ mice (p=0.045). Although the more rigorous approach the statistical analyses did not change our interpretations we feel the enhanced the paper and thank the reviewer for pushing this improvement.

      Moreover, the fibre photometry data does not appear to have any statistical analyses reported - only confidence intervals represented in the figures without any mention of whether the null hypothesis that the elevations in activity observed are different from the baseline.

      This is particularly important where there is ambiguity, such as in Figure 3K, where the spontaneous activity of the animal appears to correlate with a spike in activity but the text mentions that there is no such difference. Without statistics, this is difficult to judge.

      Thank you for highlighting this critical point and providing an opportunity to strengthen our manuscript. We added statistical analyses of all fiber photometry data using a recently described approach based on waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). In the statistical summary (Supplemental Table 1) we note the time window that we used for comparison in each analysis and if the comparisons were significant at 95%, 99%, and 99.9% thresholds. Thank you from highlighting this and helping make the manuscript stronger.

      With respect to Figure 3K, we are not certain we understood the spike in activity the reviewer referred to. Figure 3J and K include both velocity data (gold) and Ca2+ dependent signal (blue). We used episodes of velocity that were comparable to the avoidance respond during the ambush test and no significant differences in the Ca2+ signal when gating around changes in velocity in the absence of stressor (Supplemental Table1). This is in contrast to the significant change in Ca2+ signal following a mock predator ambush (Figure 3J). We interpret these data together to indicate that locomotion does not correlate with an increase in calcium activity in SuMVGLUT2+::POA neurons, but that coping to a stressor does. This conclusion is further examined in supplemental Figure 5, including examining cross-correlation to test for temporally offset relationship between velocity and Ca2+ signal in SUMVGLUT2+::POA neurons.

      The use of photostimulation only is unfortunate, it would have been really nice to see some inactivation of these neurons as well. This is because of the well-documented issues with being able to determine whether photostimulation is occurring in a physiological manner, and therefore makes certain data difficult to interpret. For instance, with regards to the 'active coping' behaviours - is this really the correct characterisation of what's going on? I wonder if the mice simply had developed immobile responding as a coping strategy but when they experience stimulation of these neurons that they find aversive, immobility is not sufficient to deal with the summative effects of the aversion from the swimming task as well as from the neuronal activation? An inactivation study would be more convincing.

      We agree with the point of the reviewer, experiments demonstrating necessity of SUMVGLUT2+::POA neurons would have added to the story here. We carried out multiple experiments aimed at addressing questions about necessity of SuMVGLUT2+::POA neurons in stress coping behaviors, specifically the forced swim assay. Efforts included employing chemogenetic, optogenetic, and tetanus toxin-based methods. We observed no effects on locomotor activity or stress coping. These experiments are both technically difficult and challenging to interpret. Interpretation of negative results, as we obtained, is particularly difficult because of potential technical confounds. Selective targeting of SuMVGLUT2+::POA neurons for inhibition requires a process requiring three viral injections and two recombination steps, increasing variability and reducing the number of neurons impacted. Alternatively, photoinhibition targeting SuMVGLUT2+::POA cells can be done using Retro-AAV injected into POA and a fiber implant over SuM. We tried both approaches. Data obtained were difficult to interpret because of questions about adequate coverage of SuMVGLUT2+::POA population by virally expressed constructs and/or light spread arose. The challenge of adequate coverage to effectively prevent output from the targeted population is further confounded by challenges inherent in neural inhibition, specifically determining if the inhibition created at the cellular level is adequate to block output in the context of excitatory inputs or if neurons must be first engaged in a particular manner for inhibition to be effective. Baseline neural activity, release probability, and post-synaptic effects could all be relevant, which photo-inhibition will potentially not resolve. So, while the trend is to always show “necessary and sufficient” effects, we’ve tried nearly everything, and we simply cannot conclude much from our mixed results. There are also wellestablished problems with existing photo-inhibition methods, which while people use them and tout them, are often ignored. We have a lot of expertise in photo-inhibition optogenetics, and indeed have used it with some success, developed new methods, yet in this particular case we are unable to draw conclusions related to inhibition. People have experienced similar challenges in locus coeruleus neurons, which have very low basal activity, and inhibition with chemogenetics is very hard, as well as with optogenetic pump-based approaches, because the neurons fire robust rebound APs. We have spent almost 2.5 years trying to get this to work in this circuit because reviews have been insistent on this result for the paper to be conclusive. Unfortunately, it simply isn’t possible in our view until we know more about the cell types involved. This is all in spite of experience using the approach in many other publications.

      We also employed less selective approaches, such as injecting AAV-DIO-tetanus toxin light chain (Tettox) constructs directly into SuM VGLUT2-Cre mice but found off target effects impacting animal wellbeing and impeding behavioral testing due viral spread to surrounding areas.

      While we are disappointed for being unable to directly address questions about necessity of SuMVGLUT2+::POA neurons in active coping with experimental data, we were unable to obtain results allowing for clear interpretation across numerous other domains the reviewers requested. We also feel strongly that until we have a clear picture of the molecular cell type architecture in the SuM, and Cre-drivers to target subsets of neurons, this question will be difficult to resolve for any group. We are working now on RNAseq and related spatial transcriptomics efforts in the SuM and examining additional behavioral paradigm to resolve these issues, so stay tuned for future publications.

      Accordingly, we avoid making statements relating to necessity in the manuscript. In spite of having several lines of physiological data with strong robust correlations behavior related to the SuMVGLUT2+::POA circuit.

      Nose poke is only nominally instrumental as it cannot be shown to have a unique relationship with the outcome that is independent of the stimuli-outcome relationships (in the same way that a lever press can, for example). Moreover, there is nothing here to show that the behaviours are goal-directed.

      Thank you for highlighting this point. Regarding goal-direct terminology, we removed this terminology from the manuscript. Since the mice perform highly selective (active vs inactive) port activation robustly across multiple days of training the behavior likely transitions to habitual behavior. We only tested the valuation of stimuli termination of the final day of training with time limited progressive ratio test. With respect to lever press versus active port activation, we are unclear how using a lever in this context would offer a different interpretation. Lever pressing may be more sensitive to changes in valuation when compared to nose poke port activation (Atalayer and Rowland 2008); however, in this study the focus of the operant behavior is separating innate behaviors for learned action–outcome instrumental learned behaviors for threat response (LeDoux and Daw 2018). The robust highly selective activation of the active port illustrated in Figure 6 fits as an action–outcome instrumental behavior wherein mice learn to engage the active but not inactive port to terminate photostimulation. The first activation of the port occurs through exploration of the arena but as demonstrated by the number of active port activations and the decline in time of the first active port engagement, mice expressing ChR2eYFP learn to engage the port to terminate the stimulation. To aid in illustrating this point we have added Supplemental Figure 7 showing active and inactive port activations for both Cre+ and Cre- mice. This adds clarity to high rate of selective port activation driven my stimulation of SUMVGLUT2+::POA neurons compared to controls. The elimination of goal directed and providing additional data narrows and supports one of the key points of the operant experiment.

      With regards to Figure 1: This is a nice figure, but I wonder if some quantification of the pathways and their density might be helpful, perhaps by measuring the intensity of fluorescence in image J (as these are processes, not cell bodies that can be counted)? Mind you, they all look pretty dense so perhaps this is not necessary! However, because the authors are looking at projections in so-called 'stress-engaged regions', the amygdala seems conspicuous by its absence. Did the authors look in the amygdala and find no projections? If so it seems that this would be worth noting.

      This is an interesting question but has proven to be a very technically challenging question. We consulted with several leaders who routinely use complimentary viral tracing methods in the field. We were unable to devise a method to provide a satisfactorily meaningful quantitative (as opposed to qualitative) approach to compare SUMVGLUT2+::POA to SuMVGLUT2+ projections. A few limitations are present that hinder a meaningful quantitative approach. One limitation was the need for different viral strategies to label the two populations. Labeling SuMVGLUT2+::POA neurons requires using VGLUT2-Flp mice with two injections into the POA and one into SuM. Two recombinase steps were required, reducing efficiency of overlap. This combination of viral injections, particularly the injections of RetroAAVs in the POA, can induce significant quantitative variability due to tropism, efficacy, and variability of retro-viral methods, and viral infection generally. These issues are often totally ignored in similar studies across the “neural circuit” landscape, but it doesn’t make them less relevant here.

      Although people do this in the field, and show quantification, we actually believe that it can be a quite misleading read-out of functionally relevant circuitry, given that neurotransmitter release ultimately is amplified by receptors post-synaptically, and many examples of robust behavioral effects have been observed with low fiber tracing complimentary methods (McCall, Siuda et al. 2017). In contrast, the broader SuMVGLUT2+ population was labeled using a single injection into the SuM. This means there like more efficient expression of the fluorophore. Additionally, in areas that contain terminals and passing fibers understanding and interpreting fluorescent signal is challenging. Together, these factors limit a meaningful quantitative comparison and make an interpretation difficult to make. In this context, we focused on a conservative qualitative presentation to demonstrate two central points. That 1) SuMVGLUT2+::POA neurons are subset of SuMVGLUT2+ neurons that project to specific areas and that exclude dentate gyrus, and they 2) arborize extensively to multiple areas which have be linked to threat responses. We agree that there is much to be learned about how different populations in SuM connect to targets in different regions of the brain and to continue to examine this question with different techniques. A meaningful quantitative study comparing projections is technically complex and, we feel, beyond our ability for this study.

      Also, for the reasons above we do not believe that quantification provides exceptional clarity with respect to the putative function of the circuit, glutamate released, or other cotransmitters given known amplification at the post-synaptic side of the circuit.

      With regard to the amygdala, other studies on SuM projections have found efferent projections to amygdala (Ottersen, 1980; Vertes, 1992). In our study we were unable to definitively determine projections from SuMVGLUT2+::POA neurons to amygdala, which if present are not particularly dense. For this reason we were conservative and do not comment on this particular structure.

      I would suggest removing the term goal-directed from the manuscript and just focusing on the active vs. passive distinction.

      We removed the use of goal-directed. Thank you for helping us clarify our terminology.

      The effect observed in Figure 7I is interesting, and I'm wondering if a rebound effect is the most likely explanation for this. Did the authors inhibit the VGAT neurons in this region at any other times and observe a similar rebound? If such a rebound was not observed it would suggest that it is something specific about this task that is producing the behaviour. I would like it if the authors could comment on this.

      We agree that results showing the change in coping strategy (passive to active) in forced swim after but not during stimulation of SuMVGAT+ neurons is quite interesting (Figure 7I). This experiment activated SuMVGAT+ neurons during a section of the forced swim assay and mice showed a robust shift to mobility after the stimulation of SuMVGAT+ neurons stopped. We did not carry out inhibition of SuMVGAT+ neurons in this manuscript. As the reviewer suggested, strong inhibition of local SuM neurons, including SUMVGLUT2+::POA neurons, could lead to rebound activity that may shift coping behaviors in confusing ways. We agree this is an interesting idea but do not have data to support the hypothesis further at this time.

      Reviewer 2

      (1) These are very difficult, small brain regions to hit, and it is commendable to take on the circuit under investigation here. However, there is no evidence throughout the manuscript that the authors are reliably hitting the targets and the spread is comparable across experiments, groups, etc., decreasing the significance of the current findings. There are no hit/virus spread maps presented for any data, and the representative images are cropped to avoid showing the brain regions lateral and dorsal to the target regions. In images where you can see the adjacent regions, there appears expression of cell bodies (such as Supp 6B), suggesting a lack of SuM specificity to the injections.

      We agree with the reviewer that the areas studied are small and technically challenging to hit. This was one of driving motivations for using multiple tools in tandem to restrict the area targeted for stimulation. Approaches included using a retrograde AAVs to express ChR2eFYP in SUMVGLUT2+::POA neurons; thereby, restricting expression to VGLUT2+ neurons that project to the POA. Targeting was further limited by placement of the optic fiber over cell bodies on SuM. Thus, only neurons that are VGLUT2+, project to the POA, and were close enough to the fiber were active by photostimulation. Regrettably, we were not able to compile images from mice where the fiber was misplaced leading to loss of behavioral effects. We would have liked to provide that here to address this comment. Unfortunately, generating heat maps for injections is not possible for anatomic studies that use unlabeled recombinase as part of an intersectional approach. Also determining the point of injection of a retroAAV can be difficult to accurately determine its location because neurons remote to injection site and their processes are labeled.

      Experiments described in Supplemental Figure 6B on VGAT neurons in SuM were designed and interpreted to support the point that SUMVGLUT2+::POA neurons are a distinct population that does not overlap with GABAergic neurons. For this point it is important that we targeted SuM, but highly confined targeting is not needed to support the central interpretation of the data. We do see labeling in SuM in VGAT-Cre mice but photo stimulation of SuMVGAT+ neurons does not generate the behavioral changes seen with activation of SUMVGLUT2+::POA neurons. As the reviewer points out, SuM is small target and viral injection is likely to spread beyond the anatomic boundaries to other VGAT+ neurons in the region, which are not the focus here. The activation would be restricted by the spread of light from the fiber over SuM (estimated to be about a 200um sphere in all directions). We did not further examine projections or localization of VGAT+ neurons in this study but focused on the differential behavioral effects of SUMVGLUT2+::POA neurons.

      (2) In addition, the whole brain tracing is very valuable, but there is very little quantification of the tracing. As the tracing is the first several figures and supp figure and the basis for the interpretation of the behavior results, it is important to understand things including how robust the POA projection is compared to the collateral regions, etc. Just a rep image for each of the first two figures is insufficient, especially given the above issue raised. The combination of validation of the restricted expression of viruses, rep images, and quantified tracing would add rigor that made the behavioral effects have more significance.

      For example, in Fig 2, how can one be sure that the nature of the difference between the nonspecific anterograde glutamate neuron tracing and the Sum-POA glutamate neuron tracing is real when there is no quantification or validation of the hits and expression, nor any quantification showing the effects replicate across mice? It could be due to many factors, such as the spread up the tract of the injection in the nonspecific experiment resulting in the labeling of additional regions, etc.

      Relatedly, in Supp 4, why isn’t C normalized to DAPI, which they show, or area? Similar for G what is the mcherry coverage/expression, and why isn’t Fos normalized to that?

      Thank you for highlighting the importance of anatomy and the value of anatomy. Two points based on the anatomic studies are central to our interpretation of the experimental data. First, SUMVGLUT2+::POA are a distinct population within the SuM. We show this by demonstrating they are not GABAergic and that they do not project to dentate gyrus. Projections from SuM to dentate gyrus have been described in multiple studies (Boulland et al., 2009; Haglund et al., 1987; Hashimotodani et al., 2018; Vertes, 1992) and we demonstrate them here for SuMVGLUT2+ cells. Using an intersectional approach in VGLUT2-Flp mice we show SUMVGLUT2+::POA neurons do not project to dentate gyrus. We show cell bodies of SUMVGLUT2+::POA neurons located in SuM across multiple figures including clear brain images. Thus, SUMVGLUT2+::POA neurons are SuM neurons that do not project to dentate gyrus, are not GABAergic, send projections to a distinct subset of targets, most notably excluding dentate gyrus. Second, SUMVGLUT2+::POA neurons arborize sending projections to multiple regions. We show this using a combinatorial genetic and viral approach to restrict expression of eYFP to only neurons that are in SuM (based on viral injection), project to the POA (based on retrograde AAV injection in POA), and VGLUT2+ (VGLUT2-Flp mice). Thus, any eYFP labeled projection comes from SUMVGLUT2+::POA neurons. We further confirmed projections using retroAAV injection into areas identified using anterograde approaches (Supplemental Figure 2). As discussed above in replies to Reviewer 1, we feel limitations are present that preclude meaningful quantitative analysis. We thus opted for a conservative interpretation as outlined.

      Prior studies have shown efferent projections from SuM to many areas, and projections to dentate gyrus have received substantial attention (Bouland et al., 2009; Haglund, Swanson, and Kohler, 1984; Hashimotodani et al., 2018; Soussi et al., 2010; Vertes, 1992; Pan and McNaugton, 2004). We saw many of the same projections from SuMVGLUT2+ neurons. We found no projections from SUMVGLUT2+::POA neurons to dentate gyrus (Figure 2). Our description of SuM projection to dentate gyrus is not new but finding a population of neurons in SuM that does not project to dentate gyrus but does project to other regions in hippocampus is new. This finding cannot be explained by spread of the virus in the tract or non-selective labeling.

      (3) The authors state that they use male and female mice, but they do not describe the n’s for each experiment or address sex as a biological variable in the design here. As there are baseline sex differences in locomotion, stress responses, etc., these could easily factor into behavioral effects observed here.

      Sex specific effects are possible; however, the studies presented here were not designed or powered to directly examine them. A point about experimental design that helps mitigate against strong sex dependent effect is that often the paradigm we used examined baseline (pre-stimulation) behavior, how behavior changed during stimulation, and how behavior returned (or not) to baseline after stimulation. Thus, we test changes in individual behaviors. Although we had limited statistical power, we conducted analyses to examine the effects of sex as variable in the experiments and found no differences among males and females.

      (4) In a similar vein as the above, the authors appear to use mice of different genotypes (however the exact genotypes and breeding strategy are not described) for their circuit manipulation studies without first validating that baseline behavioral expression, habituation, stress responses are not different. Therefore, it is unclear how to interpret the behavioral effects of circuit manipulation. For example in 7H, what would the VGLUT2-Cre mouse with control virus look like over time? Time is a confound for these behaviors, as mice often habituate to the task, and this varies from genotype to genotype. In Fig 8H, it looks like there may be some baseline differences between genotypes- what is normal food consumption like in these mice compared to each other? Do Cre+ mice just locomote and/or eat less? This issue exists across the figures and is related to issues of statistics, potential genotype differences, and other experimental design issues as described, as well as the question about the possibility of a general locomotor difference (vs only stress-induced). In addition, the authors use a control virus for the control groups in VGAT-Cre manipulation studies but do not explain the reasoning for the difference in approach.

      Thank you for highlighting the need for greater clarity about the breeding strategies used and for these related questions. We address the breeding strategy and then move to address the additional concerns raised. We have added details to the methods section to address this point. For VGLUT2-Cre mice we use litter mates controls from Cre/WT x WT/WT cross. The VGLUT2-Cre line (RRID:IMSR_JAX:028863) (Vong L , et al. 2011) used here been used in many other reports. We are not aware of any reports indicating a phenotype associated with the addition of the IRES-Cre to the Slc17a6 loci and there is no expected impact of expression of VGLUT2. Also, we see in many of the experiments here that the baseline (Figures 4, 5, and 7) behaviors are not different between the Cre+ and Cre- mice. For VGAT-Cre mice we used a different breeding strategy that allowed us to achieve greater control of the composition of litters and more efficient cohorts cohort. A Cre/Cre x WT/WT cross yielded all Cre/WT litters. The AAV injected, ChR2eYFP or eYFP, allowed us to balance the cohort.

      Regarding Figure 7H, which shows time immobile on the second day of a swim test, data from the Cre- mice demonstrate the natural course of progression during the second day of the test. The control mice in the VGAT-Cre cohort (Figure 7I) have similar trend. The change in behavior during the stimulation period in the Cre+ mice is caused by the activation of SUMVGLUT2+::POA neurons. The behavioral shift largely, but not completely, returns to baseline when the photostimulation stops. We have no reason to believe a VGLUT2-Cre+ mouse injected with control AAV to express eYFP would be different from WT littermate injected with AVV expressing ChR2eYFP in a Cre dependent manner.

      Turning to concerns related to 8H, which shows data from fasted mice quantify time spent interacting with food pellet immediately after presentation of a chow pellet, we found no significant difference between the control and Cre+ mice. We unaware of any evidence indicating that the two groups should have a different baseline since the Cre insertion is not expected to alter gene expression and we are unaware of reports of a phenotype relating to feeding and the presence of the transgene in this mouse line. Even if there were a small baseline shift this would not explain the large abrupt shift induced by the photostimulation. As noted above, we saw shifts in behavior abruptly induced by the initiation of photostimulation when compared to baseline in multiple experiments. This shift would not be explained by a hypothetical difference in the baseline behaviors of litter mates.

      (5) The statistics used throughout are inappropriate. The authors use serial Mann-Whitney U tests without a description of data distributions within and across groups. Further, they do not use any overall F tests even though most of the data are presented with more than two bars on the same graph. Stats should be employed according to how the data are presented together on a graph. For example, stats for pre-stim, stim, and post-stim behavior X between Cre+ and Cre- groups should employ something like a two-way repeated measures ANOVA, with post-hoc comparisons following up on those effects and interactions. There are many instances in which one group changes over time or there could be overall main effects of genotype. Not only is serially using Mann-Whitney tests within the same panel misleading and statistically inaccurate, but it cherry-picks the comparisons to be made to avoid more complex results. It is difficult to comprehend the effects of the manipulations presented without more careful consideration of the appropriate options for statistical analysis.

      We thank the reviewer for pointing this out and suggesting alterative analyses, we agree with the assessment on this topic. Therefore, we have extensively revised the statical approach to our data using the suggested approach. Reviewer 1 also made a similar comment, and we would like to point to our reply to reviewer 1’s second point in regard to what we changed and added to the new statistical analyses. Further, we have added a full table detailing the statical values for each figure to the paper.

      Conceptual:

      (6) What does the signal look like at the terminals in the POA? Any suggestion from the data that the projection to the POA is important?

      This is an interesting question that we will pursue in future investigations into the roles of the POA. We used the projection to the POA from SuM to identify a subpopulation in SuM and we were surprised to find the extensive arborization of these neurons to many areas associated with threat responses. We focused on the cell bodies as “hubs” with many “spokes”. Extensive studies are needed to understand the roles of individual projections and their targets. There is also the hypothetical technical challenge of manipulating one projection without activating retrograde propagation of action potentials to the soma. At the current time we have no specific insights into the roles of the isolated projection to POA. Interpretation of experiments activating only “spoke” of the hub would be challenging. Simple terminal stimulation experiments are challenged by the need to separate POA projections from activation of passing fibers targeting more anterior structures of the accumbens and septum.

      (7) Is this distinguishing active coping behavior without a locomotor phenotype? For example, Fig. 5I and other figure panels show a distance effect of stimulation (but see issues raised about the genotype of comparison groups). In addition, locomotor behavior is not included for many behaviors, so it is hard to completely buy the interpretation presented.

      We agree with the reviewer and thank them for highlighting this fundamental challenge in studies examining active coping behaviors in rodents, which requires movement. Additionally, actively responding to threatening stressors would include increased locomotor activity. Separation of movement alone from active coping can be challenging. Because of these concerns we undertook experiments using diverse behavioral paradigms to examine the elicited behaviors and the recruitment of SuMVGLUT2+::POA neurons to stressors. We conducted experiments to directly examine behaviors evoked by photoactivation of SuMVGLUT2+::POA. In these experiments we observed a diversity of behaviors including increased locomotion and jumping but also treading/digging (Figure 4). These are behaviors elicited in mice by threatening and noxious stimuli. An Increase of running or only jumping could signify a specific locomotor effect, but this is not what was observed. Based on these behaviors, we expected to find evidence of increase movement in open field (Figure 5G-I) and light dark choice (Figure 5J-L) assays. For many of the assays, reporting distance traveled is not practical. An important set of experiments that argues against a generic increase in locomotion is the operant behavior experiments, which require the animal to engage in a learned behavior while receiving photostimulation of SuMVGLUT2+::POA neurons (Figure 6). This is particularly true for testing using a progressive ratio when the time of ongoing photostimulation is longer, yet animals actively and selectively engage the active port (Figure 6G-H). Further, we saw a shift in behavioral strategy induce by photoactivation in forced swim test (Figure 7H). Thus, activation of SUMVGLUT2+::POA neurons elicited a range of behaviors that included swimming, jumping, treading, and learned response, not just increased movement. Together these data strongly argue that SuMVGLUT2+::POA neurons do not only promote increased locomotor behavior. We interpret these data together with the data from fiber photometry studies to show SuMVGLUT2+::POA neurons are recruited during acute stressors, contribute to aversive affective component of stress, and promote active behaviors without constraining the behavioral pattern.

      Regarding genotype, we address this in comments above as well but believe that clarifying the use of litter mates, the extensive use of the VGLUT2-Cre line by multiple groups, and experimental design allowing for comparison to baseline, stimulation evoked, and post stimulation behaviors within and across genotypes mitigate possible concerns relating to the genotype.

      (8) What is the role of GABA neurons in the SuM and how does this relate to their function and interaction with glutamate neurons? In Supp 8, GABA neuron activation also modulates locomotion and in Fig 7 there is an effect on immobility, so this seems pretty important for the overall interpretation and should probably be mentioned in the abstract.

      Thank you for noting these interesting findings. We added text to highlight these findings to the abstract. Possible roles of GABAergic neurons in SuM extend beyond the scope of the current study particularly since SuM neurons have been shown to release both GABA and glutamate (Li Y, Bao H, Luo Y, et al. 2020, Root DH, Zhang S, Barker DJ et al. 2018). GABAergic neurons regulate dentate gyrus (Ajibola MI, Wu JW, Abdulmajeed WI, Lien CC 2021), REM sleep (Billwiller F, Renouard L, Clement O, Fort P, Luppi PH 2017), and novelty processing Chen S, He L, Huang AJY, Boehringer R et al. 2020). The population of exclusively GABAergic vs dual neurotransmitter neurons in SuM requires further dissection to be understood. How they may relate to SUMVGLUT2+::POA neurons require further investigation.

      Questions about figure presentation:

      (9) In Fig 3, why are heat maps shown as a single animal for the first couple and a group average for the others?

      Thank you for highlighting this point for further clarification. We modified the labels in the figure to help make clear which figures are from one animal across multiple trials and those that are from multiple animals. In the ambush assay each animal one had one trial, to avoid habituation to the mock predator. Accordingly, we do not have multiple trials for each animal in this test. In contrast, the dunk assay (10 trial/animal) and the shock (5 trials/animal) had multiple trials for each animal. We present data from a representative animal when there are multiple trials per animal and the aggerate data.

      Why is the temporal resolution for J and K different even though the time scale shown is the same?

      Thank you for noticing this error carried forward from a prior draft of the figure so we could correct it. We replaced the image in 3J with a more correctly scaled heatmap.

      What is the evidence that these signal changes are not due to movement per se?

      Thank you for the question. There are two points of evidence. First, all the 465 nm excitation (Ca2+ dependent) data was collected in interleaved fashion with 415 nm (isosbestic) excitation data. The isosbestic signal is derived from GCaMP emission but is independent of Ca2+ binding (Martianova E, Aronson S, Proulx CD. 2019). This approach, time-division multiplexing, can correct calcium-dependent for changes in signal most often due to mechanical change. The second piece of evidence is experimental. Using multiple cohorts of mice, we examined if the change in Ca2+ signal was correlated with movement. We used the threshold of velocity of movement seen following the ambush. We found no correlation between high velocity movements and Ca2+ signal (Figure 3K) including cross correlational analysis (Supplemental figure 5). Based on these points together we conclude the change in the Ca2+ signal in SUMVGLUT2+::POA neurons is not due to movement induced mechanical changes and we find no correlation to movement unless a stressor is present, i.e. mock predator ambush or forced swim. Further, the stressors evoke very different locomotor responses fleeing, jumping, or swimming.

      (10) In Fig 4, the authors carefully code various behaviors in mice. While they pick a few and show them as bars, they do not show the distribution of behaviors in Cre- vs Cre+ mice before manipulation (to show they have similar behaviors) or how these behaviors shift categories in each group with stimulation. Which behaviors in each group are shifting to others across the stim and post-stim periods compared to pre-stim?

      This is an important point. We selected behaviors to highlight in Figure4 C-E because these behaviors are exhibited in response to stress (De Boer & Koolhaas, 2003; van Erp et al., 1994). For the highlighted behaviors, jumping, treading/digging, grooming, we show baseline (pre photostimulation), stimulation, and post stimulation for Cre+ and Cre- mice with the values for each animal plotted. We show all nine behaviors as a heat map in Figure 4B. The panels show changes that may occur as a function of time and show changes induced by photostimulation.

      The heatmaps demonstrate that photostimulation of SUMVGLUT2+::POA neurons causes a suppression of walking, grooming, and immobile behaviors with an increase in jumping, digging/treading, and rapid locomotion. After stimulation stops, there is an increase in grooming and time immobile. The control mice show a range of behaviors with no shifts noted with the onset or termination of photostimulation.

      Of note, issues of statistics, genotype, and SABV are important here. For example, the hint that treading/digging may have a slightly different pre-stim basal expression, it seems important to first evaluate strain and sex differences before interpreting these data.

      We examined the effects of sex as a biological variable in the experiments reported in the manuscript and found no differences among males and females in any of the experiments where we had enough animals in each sex (minimum of 5 mice) for meaningful comparisons. We did this by comparing means and SEM of males and females within each group (e.g. Cre+ males vs Cre+ female, Cre- males vs Cre- females) and then conducted a t-test to see if there was a difference. For figures that show time as a variable (e.g Figure 6C-E), we compared males and females with time x sex as main factors and compared them (including multiple comparisons if needed). We found no significant main effects or interactions between males and females. Because of this, and to maximize statistical power, we decided to move forward to keep males and females together in all the analyses presented in the manuscript. It is worth noting also that the core of the experimental design employed is a change in behavior caused by photostimulation. The mice are also the same strain with only difference being the modification to add an IRES and sequence for Cre behind the coding sequence of the Slc17A6 (VGLUT2) gene.

      (11) Why do the authors use 10 Hz stimulation primarily? is this a physiologically relevant stim frequency? They show that they get effects with 1 Hz, which can be quite different in terms of plasticity compared to 10 Hz.

      Thank you for the raising this important question. Because tests like open field and forced swim are subject to habituation and cannot be run multiple times per animal a test frequency was needed to use across multiple experiments for consistency. The frequency of 10Hz was selected because it falls within the rate of reported firing rates for SuM neurons (Farrel et al., 2021; Pedersen et al., 2017) and based on the robust but sub maximal effects seen in the real-time place preference assays. Identification of the native firing rates during stress response would be ideal but gathering this data for the identified population remains a dauting task.

      (12) In Fig 5A-F, it is unclear whether locomotion differences are playing a role. Entrances (which are low for both groups) are shown but distance traveled or velocity are not.

      In B, there is no color in the lower left panel. where are these mice spending their time? How is the entirety of the upper left panel brighter than the lower left? If the heat map is based on time distribution during the session, there should be more color in between blue and red in the lower left when you start to lose the red hot spots in the upper left, for example. That is, the mice have to be somewhere in apparatus. If the heat map is based on distance, it would seem the Cre- mice move less during the stim.

      We appreciate the opportunity to address this question, and the attention to detail the reviewer applied to our paper. In the real time place preference test (RTPP) stimulation would only be provided while the animal was on the stimulation side. Mice quickly leave the stimulation side of the arena, as seen in the supplemental video, particularly at the higher frequencies. Thus, the time stimulation is applied is quite low. The mice often retreat to a corner from entering the stimulation side during trials using higher frequency stimulation. Changing locomotor activity along could drive changes in the number entrances but we did not find this. In regard to the heat map, the color scale is dynamically set for each of the paired examples that are pulled from a single trial. To maximize the visibility between the paired examples the color scale does not transfer between the trials. As a result, in the example for 10 Hz the mouse spent a larger amount of time in the in the area corresponding to the lower right corner of the image and the maximum value of the color scale is assigned to that region. As seen in the supplemental video, mice often retreated to the corner of the non-stimulation side after entering the stimulation side. The control animal did not spend a concentrated amount of time in any one region, thus there is a lack of warmer colors. In contrast the baseline condition both Cre+ and Cre- mice spent time in areas disturbed on both sides of arena, as expected. As a result, the maximum value in the heat map is lower and more area are coded in warmer colors allowing for easier visual comparison between the pair. Using the scale for the 10 Hz pair across all leads to mostly dark images. We considered ways to optimized visualization across and within pairs and focused on the within pair comparison for visualization.

      (13) By starting with 1 hz, are the experimenters inducing LTD in the circuit? what would happen if you stop stimming after the first epoch? Would the behavioral effect continue? What does the heat map for the 1 hz stim look like?

      Relatedly, it is a lot of consistent stimulation over time and you likely would get glutamate depletion without a break in the stim for that long.

      Thank you for the opportunity to add clarity around this point regarding the trials in RTPP testing. Importantly, the trials were not carried out in order of increasing frequency of stimulation, as plotted. Rather, the order of trials was, to the extent possible with the number of mice, counterbalanced across the five conditions. Thus, possible contribution of effects of one trial on the next were minimized by altering the order of the trials.

      We have added a heat map for the 1 Hz condition to figure 5B.

      For experiments on RTPP the average stimulation time at 10Hz was less than 10 seconds per event. As a result, the data are unlikely to be affected by possible depletion of synaptic glutamate. For experiments using sustained stimulation (open field or light dark choice assays) we have no clear data to address if this might be a factor where 10Hz stimulation was applied for the entire trial.

      (14) In Fig 6, the authors show that the Cre- mice just don't do the task, so it is unclear what the utility of the rest of the figure is (such as the PR part). Relatedly, the pause is dependent on the activation, so isn't C just the same as D? In G and H, why ids a subset of Cre+ mice shown?

      Why not all mice, including Cre- mice?

      Thank you for the opportunity to improve the clarity of this section. A central aspect of the experiments in Figure 6 is the aversiveness of SUMVGLUT2+::POA neuron photostimulation, as shown in Figure 5B-F. The aversion to photostimulation drives task performance in the negative reinforcer paradigm. The mice perform a task (active port activation) to terminate the negative reinforcer (photostimulation of SuMVGLUT2+::POA neurons). Accordingly, control mice are not expected to perform the task because SuMVGLUT2+::POA neurons are not activated and, thus the mice are not motivated to perform the task.

      A central point we aim to covey in this figure is that while SuMVGLUT2+::POA neurons are being stimulated, mice perform the operant task. They selectively activated the active port (Supplemental Figure 7). As expected, control mice activate the active port at a low level in the process of exploring the arena. This diminishes on subsequent trials as mice habituate to the arena (Figure 6D). The data in Figures 6 C and D are related but can be divergent. Each pause in stimulation requires a port activation of a FR1 test but the number of port activations can exceed the pauses, which are 10 seconds long, if the animal continues to activate the port. Comparing data in Figures 6 C and D revels that mice generally activated the port two to three times for each pause earned with a trend towards greater efficiency on day 4 with more rewards and fewer activations.

      The purpose of the progressive ratio test is to examine if photostimulation of SuMVGLUT2+::POA continues to drive behavior as the effort required to terminate the negative stimuli increases. As seen in Figures 6 G and H, the stimulation of SuMVGLUT2+::POA neurons remains highly motivating. In the 20-minute trial we did not find a break point even as the number of port activations required to pause the stimulation exceed 50. We do not show the Cre- mice is Figure 6G and H because they did not perform the task, as seen in Figure 6F. For technical reasons in early trials, we have fully timely time stamped data for rewards and port activations from a subset of the Cre+ mice. Of note, this contains both the highest and lowest performing mice from the entire data set.

      Taken together, we interpret the results of the operant behavioral testing as demonstrating that SuMVGLUT2+::POA neuron activation is aversive, can drive performance of an operant tasks (as opposed to fixed escape behaviors), and is highly motivating.

      (15) In Fig 7, what does the GCaMP signal look like if aligned to the onset of immobility? It looks like since the hindpaw swimming is short and seems to precede immobility, and the increase in the signal is ramping up at the onset of hindpaw swimming, it may be that the calcium signal is aligned with the onset of immobility.

      What does it look like for swimming onset?

      In I, what is the temporal resolution for the decrease in immobility? Does it start prior to the termination of the stim, or does it require some elapsed time after the termination, etc?

      Thank for the opportunity to addresses these points and improve that clarity of our interpretation of the data. Regarding aligning the Ca2+ signal from fiber photometry recordings to swimming onset and offset, it is important to note that the swimming bouts are not the same length. As a result, in the time prior to alignment to offset of behaviors animals will have been swimming for different lengths of time. In Figure 7 C, we use the behavioral heat map to convey the behavioral average. Below we show the Ca2+ dependent signal aligned at the offset of hindpaw swim for an individual mouse (A) and for the total cohort (B). This alignment shows that the Ca2+ dependent signal declines corresponding to the termination of hindpaw swimming. Because these bouts last less than the total the widow shown, the data is largely included in Figure 7 C and D, which is aligned to onset. Due to the nuance of the difference is the alignment and the partial redundancy, we elected to include the requested alignment to swimming offset in the reply rather in primary figure.

      Author response image 1.

      Turning to the question regarding swimming onset, the animals started swimming immediately when placed in the water and maintained swimming and climbing behaviors until shifting behaviors as illustrated in Figure 7A and B. During this time the Ca2+-dependent signal was elevated but there is only one trial per animal. This question can perhaps be better addressed in the dunk assay presented in Figure 3C, F and G and Supplemental Figure 4 H and I. Here swimming started with each dunk and the Ca2+ signal increased.

      Regarding the question for about figure 7I. We scored for entire periods (2 mins) in aggerate. We noted in videos of the behavior test that there was an abrupt decrease in immobility tightly corresponding to the end of stimulation. In a few animals this shift occurred approximately 15-20s before the end of stimulation. This may relate to the depletion of neurotransmitter as suggested by the reviewer.

      Reviewer 3

      Major points

      (1) Results in Figure 1 suggested that SuM-Vglu2::POA projected not only POA but also to the diverse brain regions. We can think of two models which account for this. One is that homogeneous populations of neurons in SuM-Vglu2::POA have collaterals and innervated all the efferent targets shown in Figure 1. Another is to think of distinct subpopulations of neurons projecting subsets of efferent targets shown in Figure 1 as well as POA. It is suggested to address this by combining approaches taken in experiments for Figure 1 and Supplemental Figure 2.

      Thank you for raising this interesting point. We have attempted combining retroAAV injections into multiple areas that receive projections from SUMVGLUT2+::POA neurons. However, we have found the results unsatisfactory for separating the two models proposed. Using eYFP and tdTomato expressing we saw some overlapping expressing in SuM. We are not able to conclude if this indicates separate populations or partial labeling of a homogenous populations. A third option seems possible as well. There could be a mix of neurons projecting to different combinations of downstream targets. This seems particularly difficult to address using fluorophores. We are preparing to apply additional methodologies to this question, but it extends beyond the scope of this manuscript.

      (2) Since the authors drew a hypothetical model in which the diverse brain regions mediate the effect of SuM-Vglu2::POA activation in behavioral alterations at least in part, examination of the concurrent activation of those brain regions upon photoactivation of SuM-Vglu2::POA. This must help the readers to understand which neural circuits act upon the induction of active coping behavior under stress.

      Thank you for raising this important point. We agree that activating glutamatergic neurons should lead to activation of post synaptic neurons in the target regions. Delineating this in vivo is less straight forward. Doing so requires much greater knowledge of post synaptic partners of SUMVGLUT2+::POA neurons. There are a number of issues that would need to be accounted for. Undertaking two color photo stimulation plus fiber photometry is possible but not a technical triviality. Further, it is possible that we would measure Ca2+ signals in neurons that have no relevant input or that local circuits in a region may shape the signal. We would also lack temporal resolution to identify mono-postsynaptic vs polysynaptic connections. Thus, we would struggle to know if the change in signal was due to the excitatory input from SuM or from a second region. At present, we remain unclear on how to pursue this question experimentally in a manner that is likely to generate clearly interpretable results.

      (3) In Figure 4, "active coping behaviors" must be called "behaviors relevant to the active behaviors" or "active coping-like behaviors", since those behaviors were in the absence of stressors to cope with.

      Thank you for the suggestion on how to clarify our terminology. We have adopted the active coping-like term.

      (4) For the Dunk test, it is suggested to describe the results and methods more in detail, since the readers would be new to it. In particular, the mice could change their behavior between dunks under this test, although they still showed immobility across trials as in Supplemental Figure 4I. Since neural activity during the test was summarized across trials as in Figure 3, it is critical to examine whether the behavior changes according to time.

      Thank you for identifying this opportunity to improve our manuscript. We have expanded and added a detailed description of the dunk test in the methods section.

      As for Supplemental Figure 4I, we apologize for the confusion because the purpose of this figure is to show that mice remained mobile for the entire 30-second dunk trial. This did not appreciably change over the 10 trials. We have revised this figure to plot both immobile and mobile time to achieve greater clarity on this point.

      Minor points

      Typos

      In Figure 1, please add a serotype of AAVs to make it compatible with other figures and their legends.

      In the main text and Figure 2K, the authors used MHb/LHb and mHb/lHb in a mixed fashion. Please make them unified.

      In the figure legend of Figure 6, change "SuMVGLUT2+::POA neurons drive" to "SuMVGLUT2+::POA neurons " in the title.

      In line 86, please change "Retro-AAV2-Nuc-flox(mCherry)-eGFP" to "AAV5-Nuc-flox(mCherry)eGFP".

      In line 80, please change "Positive controls" to "As positive controls, ".

      Thank you for taking the time and making the effort to identify and call these out. We have corrected them.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to both Reviewers, the Reviewing Editor and the Senior Editor for carefully reviewing our manuscript and for providing useful comments and suggestions that further improved the quality of our work. We appreciate that our work is perceived to substantially advance the understanding of osteoblast migration and that the experiments are found to be rigorous and to provide conclusive evidence. We also look forward to reaching a broad audience in the field. Below we provide a point-by-point response to each suggestion made by the reviewers and explain how we included their recommendations in the revised manuscript.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to achieve that Tgif1 expression is regulated by EAK1/2 and PTH in a timedependent manner, and its roles in suppressing Pak3 for facilitating osteoblast adhesion. The authors further tried to show that the Tgif1- Pak3 signaling plays a significant role in osteoblast migration to the site of bone repair and bone remodeling.

      Strengths:

      • In a previous study, it was demonstrated that Tgif1 is a target gene of PTH, and the absence of Tgif1 failed to increase bone mass by PTH treatment (Saito et al., Nat Commun., 2019). In this study, the authors found that Tgif1-Pak3 signaling prompts osteoblast migration through osteoblast adhesion to prompt bone regeneration. This novel finding provides a better understanding of how Tgif1 expression in osteoblasts regulates adherence, spreading, and migration during bone healing and bone remodeling.

      • The authors demonstrated that ERK1/2 and PTH regulate Tgif1 expression in a time-dependent manner and its role in suppressing Pak3 through various experimental approaches such as luciferase assay, ChIP assay, and gene silencing. These results contribute to the overall strength of the article.

      We thank the reviewer for acknowledging the novelty of our findings as well as the strength of the manuscript.

      Weaknesses:

      • The authors need to further justify why they focused on Pak3 in the introduction by mentioning its known function for cell adhesion.

      We thank the reviewer for this suggestion. We mention in the introduction that we further investigated Pak3 due to its implication in cell adhesion (page 6, lines 7-8).

      • Some results indicated statistically significant but small changes. The authors need to explain in the discussion part why they believe this is the major mechanism or why there may be some other possible mechanisms.

      We agree with this comment. We are confident that our work identified an important mechanism by which Tgif1 regulates cellular features of osteoblasts. However, it is certainly possible that other mechanisms may exist as well. We discuss this point in the revised manuscript (page 18, lines 16-17).

      • The study does not include enough in vivo data to claim that this mechanism is crucial for bone healing and bone remodeling in vivo.

      Re: We agree with this point and have modified the abstract accordingly by replacing “crucial” with “implicated in” as well as the text by changing “crucial” to “important” (page 2, line 9). Furthermore, we discuss this limitation in the revised manuscript (page 18, lines 9-14).

      Reviewer #2 (Public Review):

      Summary:

      Bolamperti S. et al. 2023 investigate whether the expression of TG-interacting factor (Tgif1) is essential for osteoblastic cellular activity regarding morphology, adherence, migration/recruitment, and repair. Towards this end, germ-line Tgif1 deletion (Tgif1-/-) mice or male mice lacking expression of Tgif1 in mature osteoblastic and osteocytic cells (Dmp1-Cre+; Tgif1fl/fl) and corresponding controls were studied in physiological, bone anabolic, and bone fracture-repair conditions. Both Tgif1-/- and Dmp1-Cre+; Tgif1fl/fl exhibited decreased osteoblasts on cancellous bone surfaces and adherent to collagen I-coated plates. Tgif1-/- mice exhibit impaired healing in the tibial midshaft fracture model, as indicated by decreased bone volume (BV/Cal.V), osteoid (OS/BS), and low osteoblasts (number and surface). Likewise, both Tgif1-/- and Dmp1-Cre+; Tgif1fl/fl show impaired PTH 1-34, (100µg/kg, 5x/wk for 3 wks) osteoblast activation in vivo, as detected by increases in quiescent bone surfaces. Mechanistic in vitro studies then utilized primary osteoblasts isolated from Tgif1-/- mice and siRNA Tgif1 knockdown OCY454 cells to further investigate and identify the downstream Tgif1 target driving these osteoblastic impairments. In vitro, Tgif1-/- osteoblastic and Tgif1 knockdown OCY454 cells exhibit decreased migration, abnormal morphology, and decreased focal adhesions/cells. Unexpectantly though, localization assays revealed Tgif1 to primarily concentrate in the nucleus and not to co-localize with focal adhesions (paxillin, talin). Also, the expression of major focal adhesion components (paxillin, talin, FAK, Src, etc.) or the Cdc42 family was not altered by loss of Tgif1 expression. In contrast, PAK3 expression is markedly upregulated by loss of Tgif1. In silico analysis followed by mechanistic molecular assays involving ChIP, siRNA (Tgif1, PAK3), and transfection (rat PAK3 promoter) techniques show that Tgif1 physically binds to a specific site in the PAK3 promoter region. Further, the knockdown of PAK3 rescues the Tgif1-deficient abnormal morphology in OCY454 cells. This is the first study to identify the novel transcriptional repression of PAK3 by Tgif1 as well as the specific Tgif1 binding site within the PAK3 promoter.

      Strengths:

      This work has a plethora of strengths. The co-authors achieved their aim of eliciting the role of Tgif1 expression in osteoblastic cellular functions (morphology, spreading/attachment, migration).

      Further, this work is the first to depict the novel mechanism of Tgif1 transcriptional repression of PAK3 by a thorough usage of mechanistic molecular assays (in silico analysis, ChIP, siRNA, transfection etc.). The conclusions are well supported and justified by these findings, as the appropriate controls, sample sizes (statistical power), statistics, and assays were fully utilized. The claims and conclusions are justified by the data.

      Re: We are grateful to this reviewer for recognizing the novelty, strengths, and rigor of our study and for acknowledging that the data convincingly support the conclusions drawn.

      Weaknesses:

      The discussion section could be expanded with a few sentences regarding limitations to the current study and potential future directions.

      Re: In the revised manuscript, we are discussing limitations of the work and describe possible future directions (page 18, line 9-14).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The cell spreading and migration assay is quite artificial. Trypsinized osteoblasts and quiescent osteoblasts are totally different. The authors need to cite papers from other groups to justify whether the cell spreading and migration assay is appropriate to achieve the goals of this study.

      Re: The reviewer is right that in vitro assays are often artificial and do not necessarily fully reflect in vivo situations. We have taken this aspect into account and discuss it in the revised manuscript (page 18, lines 9-10). In addition, we have included references from other groups who have used similar assays to study cell spreading and migration (Dejaeger M et al., 2017 and Dang et al., 2018).

      (2) Page 13 Line 15: The statement "Osteoblasts are greatly impaired in the ability to migrate into the repair zone" is an overstatement. The experiments in Figure 5 do not necessarily reflect osteoblast migration activities. The authors need to rephrase the sentence or need to show observation of earlier time points (e.g., 1 week after fracture) in their bone healing experiments. The number of osteoblasts/surface in Tgif1+/+ and Tgif1-/- mice at different time points during bone healing should be a good indicator for the migration of osteoblasts to the repair site.

      Re: We understand the critique that a time course or lineage tracing experiments would provide better evidence for the statement of osteoblast migration into the repair zone. To avoid overinterpretations we have removed the sentence from the revised manuscript.

      (3) Page 14, Line 24: Regarding the sentence "The observation that Tgif1 is crucial for osteoblast adherence, spreading, and migration", the authors need to clearly mention this statement is based on the in vitro experiments. The animal studies are not enough to claim that the mechanism is crucial for adherence, spreading, and migration.

      Re: We thank the reviewer for pointing out this limitation. We have clarified that the finding that Tgif1 is crucial for osteoblast adherence, spreading and migration was made in vitro (page 14, line 22).

      (4) The authors need to demonstrate the suppression of Pak3 expression in PTH-treated mice in vivo, in addition to the in vitro culture system (Fig. 7C and 7D).

      Re: We agree with the reviewer that this experiment would be very insightful. However, this is beyond the scope of the current work. Nevertheless, to take this valid point into consideration, we mention it in the discussion as potential future direction (page 18, lines 11-14).

      (5) The authors need to demonstrate that the pharmacologic suppression of Pak3 in Tgif1-/- mice reduces the % of quiescent surface/BS in vivo.

      Re: This point is also well taken, and we agree that a suppression of Pak3 in Tgif1-deficient mice would be very informative to support our in vitro findings. However, this may also be part of future investigations. This is emphasized in the discussion of the revised manuscript (page 18, lines 11-14).

      Figures (Minor)

      Fig. 1:

      Fig. 1A

      Arrows need to indicate a more precise position.

      Re: The position of the arrows has been optimized.

      Fig. 1DE

      What are blue/red bars (genotypes)?

      Re: The colors indicate the genotypes. A legend has been added to the revised figure.

      Fig. 1K

      Quantification data is needed.

      Re: Thank you for this suggestion. We added a quantification of the data (Fig. 1L, M; page 8, lines 3-4; page 21, lines 5-6)

      Fig. 2A

      Show the representative high-magnification image of round (non-spread) cells.

      Re: Representative high-magnification images (insets) are provided in the revised figure 2A.

      Fig. 5

      Red arrows need to indicate a more precise position.

      Re: The arrows have been repositioned.

      Fig. 6A, C

      Red arrows need to indicate a more precise position.

      Re: The arrows have been repositioned.

      Reviewer #2 (Recommendations For The Authors):

      (1) The microscopy images and analyses are excellent.

      Re: We thank the reviewer for acknowledging the quality of our microscopy studies.

      (2) Since the Tgif1-/- mouse has low osteoclast numbers, is it possible that this is a contributing factor to the delays/impairment in bone healing, given that resorption also has a role in fracture repair? Since the focus of these studies is on osteoblastic cells, this point is a little out of scope. However, would the authors consider exploring this further in the discussion section?

      Re: This point is well taken by the reviewer, and we agree that osteoclasts could certainly play a role in the impaired fracture healing. To acknowledge this aspect, we followed the recommendation and discuss this aspect in the revised manuscript (page 16, lines 22-24).

      Revisions

      Would the authors consider slightly re-wording the title? Tgif1 suppresses PAK3 expression; however, Tgif1-deficiency leads to the unregulated elevation of PAK3 expression.

      Re: Thank you for pointing this out. We agree with the reviewer and adapted the title accordingly.

      Suggestions

      (1) Is it possible that apoptosis and/or anoikis is being induced by Tgif1 deficiency in osteoblastic cells?

      Re: We do not have data towards this direction and although Tgif1-deficient osteoblasts are overall viable and well expanding, we cannot fully exclude this possibility.

      (2) For the fracture study, any differences in overall callus size? Would it be possible to perform micro-CT imaging with some of these samples?

      Re: There is no difference in non-mineralized callus size between Tgif1+/+ and Tgif1-/- mice. However, there is less mineralized bone per callus area in Tgif1-/- mice, confirming an impaired osteoblast phenotype. As suggested by the reviewer, we added representative micro-CT images and the respective information to the revised manuscript (Fig 5F; pages 19-20).

      (3) Fracture repair experiment-is PAK3 expression downregulated with fracture injury; and/or, is PAK3 upregulated by loss of Tgif1 expression?

      Re: Unfortunately, we do not have data to answer this very interesting question and it would need to be addressed in future studies. This is mentioned in the revised discussion (page 18, lines 12-14).

      (4) Fig 7F. within PTH treated cells, is the light blue SCR sphericity statistically different than the light green siTgif1 + siPAK3 ? While the statement of the "lack of both, Tgif1 and PAK3 prevented PTH-induced decrease in cell sphericity" is supported by the lack of differences between dark green vs. light green; is it also possible that this is due to the siPAK3 returning sphericity to control (scr) levels? (i.e. hitting a floor limit of detection).

      Re: We thank the reviewer for this thoughtful question. There is no statistically significant difference between light blue and light green. Silencing PAK3 restores the impaired capacity to spread that occurs in the absence of Tgif1 to the level of scr controls (significant difference between dark and light red vs. dark and light green and no difference between either dark or light blue vs. dark or light green). However, unlike in the (scr) controls, in the absence of both Tgif1 and PAK3, the cells do not respond to PTH (statistically significant difference between dark and light blue, no difference between dark and light green). Based on the data, cells can reach sphericity of less than 0.2 and thus it is unlikely that sphericity is “hitting the floor level of detection” in these groups.

    1. Author Response

      We would like to thank the reviewers for their positive comments and valuable suggestions for improvements to the manuscript. We intend to revisit the discussion to clarify our interpretation of how azithromycin resistance mutations impact the transmission potential of P. falciparum and expand on the differences between mouse and human malaria. Additionally, we intend to adjust the title to better align with the revised interpretation of the main findings. These changes will be reflected in the revised manuscript to be submitted as the eLife Version of Record.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The current study aims to quantify associations between the regular use of proton-pump inhibitors (PPI) - defined as using PPI most days of the week during the last 4 weeks at one cross-section in time - with several respiratory outcomes up to several years later in time. There are 6 respiratory outcomes included: risk of influenza, pneumonia, COVID-19, other respiratory tract infections, as well as COVID-19 severity and mortality).

      Strengths:

      Several sensitivity analyses were performed, including i) estimation of the e-value to assess how strong unmeasured confounders should be to explain observed effects, ii) comparison with another drug with a similar indication to potentially reduce (but not eliminate) confounding by indication.

      Thank you for pointing out the strengths of our article. We also sincerely thank the reviewer for raising several concerns and providing significant suggestions to improve our manuscript. We will revise our manuscript according to our provisional responses.

      Weaknesses:

      (1) The main exposure of interest seems to be only measured at one time-point in time (at study enrollment) while patients are considered many years at risk afterwards without knowing their exposure status at the time of experiencing the outcome. As indicated by the authors, PPI are sometimes used for only short amounts of time. It seems biologically implausible that an infection was caused by using PPI for a few weeks many years ago.

      We agree with the reviewer, and this is one of the limitations of the UK Biobank data. We might identify potential long-term PPI users by defining the users that have certain indications, since they tend to regularly take PPI for a long period rather than only short amounts of time. We will evaluate the effect modification for the subgroup of potential long-term PPI users.

      (2) Previous studies have shown that by focusing on prevalent users of drugs, one often induces several biases such as collider stratification bias, selection bias through depletion of susceptible, etc.

      Due to the limitations of the data from the UK Biobank, including the lack of information on the initiation of medications and close follow-up, we can only use prevalent user design to evaluate the associations between PPI use and respiratory outcomes. We will further discuss it in the limitation section.

      (3) It seems Kaplan Meier curves are not adjusted for confounding through e.g. inverse probability weighting. As such the KM curves are currently not informative (or the authors need to make clearer that curves are actually adjusted for measured confounding).

      We will provide Kaplan Meier curves adjusted for confounding by inverse probability weighting according to the reviewer’s suggestion.

      (4) Throughout the manuscript the authors seem to misuse the term multivariate (using one model with e.g. correlated error terms to assess multiple outcomes at once) when they seem to mean multivariable.

      We will correct the misused terms throughout the manuscript according to the reviewer’s suggestions.

      (5) Given multiple outcomes are assessed there is a clear argument for accounting for multiple testing, which following the logic of the authors used in terms of claiming there is no association when results are not significant may change their conclusions. More high-level, the authors should avoid the pitfall of stating there is evidence of absence if there is only an absence of evidence in a better way (no statistically significant association doesn't mean no relationship exists).

      We will revise our interpretation of the results, especially for those without statistically significant associations based on the reviewer’s advice.

      (6) While the authors claim that the quantitative bias analysis does show results are robust to unmeasured confounding, I would disagree with this. The e-values are around 2 and it is clearly not implausible that there are one or more unmeasured risk factors that together or alone would have such an effect size. Furthermore, if one would use the same (significance) criteria as used by the authors for determining whether an association exists, the required effect size for an unmeasured confounder to render effects 'statistically non-significant' would be even smaller.

      We agree with the reviewer that there might still exist one or more unmeasured risk factors that have effect sizes larger than 2. Therefore, we could not state that the results are robust to unmeasured confounding based on the current analysis, and this would be a limitation of our study. We will add the above information to the discussion section.

      (7) Some patients are excluded due to the absence of follow-up, but it is unclear how that is determined. Is there potentially some selection bias underlying this where those who are less healthy stop participating in the UK biobank?

      We will provide the details for the determination of absence of follow-up in the UK Biobank and illustrate whether it potentially induced selection bias.

      (8) Given that the exposure is based on self-report how certain can we be that patients e.g. do know that their branded over-the-counter drugs are PPI (e.g. guardium tablets)? Some discussion around this potential issue is lacking.

      In the data collection of the UK Biobank, the participants can enter the generic or trade name of the treatment on the touchscreen to match the medications they used. We will discuss this important issue in the discussion section.

      (9) Details about the deprivation index are needed in the main text as this is a UK-specific variable that will be unfamiliar to most readers.

      We will provide details about the deprivation index in the manuscript.

      (10) It is unclear how variables were coded/incorporated from the main text. More details are required, e.g. was age included as a continuous variable and if so was non-linearity considered and how?

      Age was included as a continuous variable. We will provide information on whether non-linearity was considered in our manuscript.

      (11) The authors state that Schoenfeld residuals were tested, but don't report the test statistics. Could they please provide these, e.g. it would already be informative if they report that all p-values are above a certain value.

      We will provide the test statistics for the Schoenfeld residuals.

      (12) The authors would ideally extend their discussion around unmeasured confounding, e.g. using the DAGs provided in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7832226/, in particular (but not limited to) around severity and not just presence/absence of comorbidities.

      We will use the DAGs provided by the article (PMC7832226) to extend our discussion around unmeasured confounding, especially the severity of comorbidities.

      (13) The UK biobank is known to be highly selected for a range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. The potential problems this might create in terms of collider stratification bias - as highlighted here for example: https://www.nature.com/articles/s41467-020-19478-2 - should be discussed in greater detail and also appreciated more when providing conclusions.

      We agree with the reviewer that the highly selective nature of the UK Biobank might create collider stratification bias for the evaluation of COVID-19-related outcomes. We will further discuss this in detail and be cautious when generating conclusions.  

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al investigate in an observational population-based cohort study whether the use of proton pump inhibitors (PPIs) is associated with an increased risk of several respiratory infections among which are influenza, pneumonia, and COVID-19. They conclude that compared to non-users, people regularly taking PPIs have increased susceptibility to influenza, pneumonia, as well as COVID-19 severity and mortality. By performing several different statistical analyses, they try to reduce bias as much as possible, to end up with robust estimates of the association.

      Strengths:

      The study comprehensively adjusts for a variety of critical covariates and by using different statistical analyses, including propensity-score-matched analyses and quantitative bias analysis, the estimates of the associations can be considered robust.

      We thank the reviewer for demonstrating the strengths of our articles. We will further revise our manuscript according to the reviewer’s suggestions.

      Weaknesses:

      As it is an observational cohort study there still might be bias. Information on the dose or duration of acid suppressant use was not available, but might be of influence on the results. The outcome of interest was obtained from primary care data, suggesting that only infections as diagnosed by a physician are taken into account. Due to the self-limiting nature of the outcome, differences in health-seeking behavior might affect the results.

      We will try to adjust or provide discussions about the above factors, including the dose/duration of PPI use, outcome assessment, and health-seeking behavior.

    1. Author Response

      The following is the authors’ response to the original reviews.

      General remarks for the Editor and the Reviewers

      We would like to thank the Editor and the Reviewers for their feedback. Below we address their comments and present our point-by-point responses as well as the related changes in the manuscript.

      In addition to these changes, in a few cases we have found it necessary to move some texts and provide some additional explanations within the manuscript. We emphasize that these amendments have been made for only technical reasons, and do not alter the results and conclusions of the paper, but may help to render the text more coherent and understandable to readers with little knowledge of the subject.

      These minor corrections are:

      • We extended the Introduction section by a sentence (lines 40-42) that is intended to fit the proposed template directed, non-enzymatic replication mechanism into a more general prebiotic evolutionary context, thus emphasizing its biological relevance. This sentence includes an additional reference (Rosenberger et al., 2021).

      • Two very methodologically oriented and repeated descriptions of random sequence generation have been moved to the Methods section (lines 178-185) from the Results section (lines 336-339 and lines 351-354).

      • We complemented the Data availability statement with licensing information (lines 684-685).

      • Further minor changes (also indicated by red texts) have been implemented to remedy logical and grammatical glitches.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Szathmary and colleagues explore the parabolic growth regime of replicator evolution. Parabolic growth occurs when nucleic acid strain separation is the rate-limiting step of the replication process which would have been the case for non-enzymatic replication of short oligonucleotide that could precede the emergence of ribozyme polymerases and helicases. The key result is that parabolic replication is conducive to the maintenance of genetic diversity, that is, the coexistence of numerous master sequences (the Gause principle does not apply). Another important finding is that there is no error threshold for parabolic replication except for the extreme case of zero fidelity.

      Strengths:

      I find both the analytic and the numerical results to be quite convincing and well-described. The results of this work are potentially important because they reveal aspects of a realistic evolutionary scenario for the origin of replicators.

      Weaknesses:

      There are no obvious technical weaknesses. It can be argued that the results represent an incremental advance because many aspects of parabolic replication have been explored previously (the relevant publications are properly cited). Obviously, the work is purely theoretical, experimental study of parabolic replication is due. In the opinion of this reviewer, though, these are understandable limitations that do not actually detract from the value of this work.

      We are grateful that this Reviewer appreciates our work. We completely agree that the ultimate validation must come from experiments. It is important to stress that in this field theory often preceded experimental work by decades, and the former often guided the latter. We hope that for the topic of the present paper experiments will follow considerably faster.

      Reviewer #2 (Public Review):

      Summary:

      A dominant hypothesis concerning the origin of life is that, before the appearance of the first enzymes, RNA replicated non-enzymatically by templating. However, this replication was probably not very efficient, due to the propensity of single strands to bind to each other, thus inhibiting template replication. This phenomenon, known as product inhibition, has been shown to lead to parabolic growth instead of exponential growth. Previous works have shown that this situation limits competition between alternative replicators and therefore promotes RNA population diversity. The present work examines this scenario in a model of RNA replication, taking into account finite population size, mutations, and differences in GC content. The main results are (1) confirmation that parabolic growth promotes diversity, but that when the population size is small enough, sequences least efficient at replicating may nevertheless go extinct; (2) the observation that fitness is not only controlled by the replicability of sequences, but also by their GC content; (3) the observation that parabolic growth attenuates the impact of mutations and, in particular, that the error threshold to which exponentially growing sequences are subject can be exceeded, enabling sequence identity to be maintained at higher mutation rates.

      Strengths:

      The analyses are sound and the observations are intriguing. Indeed, it has been noted previously that parabolic growth promotes coexistence, its role in mitigating the error threshold catastrophe - which is often presented as a major obstacle to our understanding of the origin of life - had not been examined before.

      Weaknesses:

      Although all the conclusions are interesting, most are not very surprising for people familiar with the literature. As the authors point out, parabolic growth is well known to promote diversity (SzathmaryGladkih 89) and it has also been noted previously that a form of Darwinian selection can be found at small population sizes (Davis 2000).

      Given that under parabolic growth, no sequence is ever excluded for infinite populations, it is also not surprising to find that mutations have a less dramatic exclusionary impact.

      In the two articles cited (Szathmary-Gladkih 1989 and Davis 2000) the subexponentiality of the system was implemented in a mechanistic way, by introducing the exponent 0 < 𝑝 < 1. Although the behaviour of these models is more or less consistent with experimental findings (von Kiedrowski, 1986; Zielinski and Orgel, 1987), the divergence of per capita growth rates (𝑥̇/𝑥) at very low concentrations–which guarantees the ability to maintain unlimited diversity in the case of infinite population sizes–makes this formal approach partly unrealistic.

      To avoid the possible artefacts of this mechanistic approach, and as there are no previous studies analysing the diversity maintaining ability of finite populations of parabolic replicators in an individual-based model context, we implemented a simplified template replication mechanism leading to parabolic growth and analysed the dynamics in an individual-based stochastic model context. The key point of our investigation is that considerable diversity can be maintained in the system even when the population size is quite small.

      Regarding the Reviewer’s comment on selection: Darwinian selection can only occur in a simple subexponential dynamics if the ratio of replicabilities diverges, cf. Eq. (8) and the preceding paragraph in Davis, 2000.

      Our results also show (Figs. 4B and 4C) that high mutation rates and the error threshold problem can still be considered as a major limiting factor for parabolically replicating systems in terms of their diversity-maintaining ability. In the light of the above, potential mechanisms to relax the error threshold in such systems, one of which is demonstrated in the present study, seem to be important steps to account for the sequence diversification and increase in molecular complexity during the early evolution of RNA replicators.

      A general weakness is the presentation of models and parameters, whose choices often appear arbitrary. Modeling choices that would deserve to be further discussed include the association of the monomers with the strands and the ensuing polymerization, which are combined into a single association/polymerization reaction (see also below), or the choice to restrict to oligomers of length L = 10. Other models, similar to the one employed here, have been proposed that do not make these assumptions, e.g. Rosenberger et al. Self-Assembly of Informational Polymers by Templated Ligation, PRX 2021. To understand how such assumptions affect the results, it would be helpful to present the model from the perspective of existing models.

      The assumption of one-step polymerization reactions that we used here is a common technique for modelling template replication of sequence-represented replicators [see, e.g., Fontana and Schuster, 1998 (10.1126/science.280.5368.1451), Könnyű et al., 2008 (10.1186/1471-2148-8267), Vig-Milkovics et al, 2019 (10.1016/j.jtbi.2018.11.020) or Szilágyi et al., 2020 (10.1371/journal.pgen.1009155)]. This is because assuming base-to-base polymerisation of the copy would lead to a very large number of different types of intermediates, which a Gillespietype stochastic simulation algorithm could not handle in reasonable computation times, even if the sequences were relatively short. For comparison, in our model, where polymerization is one-step, the characteristic time of a simulation for 𝐿 = 10, 𝑁 = 105 and 𝛿 = 0.01 was 552 hours.

      Note that in Rosenberg et al. (PRX 2021), in contrast to a pioneering work [Fernando et al, 2007 (10.1007/s00239-006-0218-4)], sequences of replicators are not represented, which makes this approach completely inapplicable to our case, in which sequence defines the fitness. In sum, we suggest that this valid criticism points to possible future work.

      The values of the (many) parameters, often very specific, also very often lack justifications. For example, why is the "predefined error factor" ε = 0.2 and not lower or higher? How would that affect the results?

      A general remark. For the more important parameters , several values were used to test the behaviour of the model (see Table 1), but due to the considerable number of parameters, it is impossible to examine all possible combinations. 𝑐+ = 1 fixes the timescale, 𝐿 is set to 10 to obtain reasonable running times (see above).

      𝜀 characterizes how replicability decreases as the number of mutations increases. In the manuscript we used the following default vector: 𝜀 = (0.05, 0.2, 1) in which the third element corresponds to the mutation-free sequence, so it must to be 1. The first element determines the baseline replicability (see Methods), which we preferred not to change because it would fundamentally alter the ratio of replication propensities to association and dissociation propensities (as the substantial amount of complementary sequences of the master sequences are of baseline replicability) and thus would alter the reaction kinetics to an extent that it is not comparable with the original results. Therefore, only the second element can be adjusted. Accordingly, we have analysed the behaviour of the model in the cases of a steeper and a more gradual loss of replicability using the following two vectors, respectively: 𝜀, = (0.05, 𝟎. 𝟎𝟓, 1) and 𝜀,, = (0.05, 𝟎. 𝟓, 1). The choice of 𝜀, is chemically more plausible, since for very short oligomers the loss of chemical activity and replicability as a function of the number of mutations can be very sharp. We performed a series of simulations with all possible combinations of 𝛿 = 0.001, 0.005, 0.1 and 𝑁 = 103, 104, 105 for 𝜀′ and 𝜀,,in the constant population and chemostat model context (36 different runs). For other parameters, we took the default values, see Table 1. These values also correspond to the parameters we used in Figures 2 and 6. The results show that the steeper loss of replicability (𝜀,) slightly increases the diversity maintaining ability of the system, whereas the more gradual loss of replicability (𝜀,,) moderately decreases the diversity-maintaining ability of the system, and that these shifts are more pronounced in the constant population size model (Author response image 1) than in the chemostat model (Author response image 2). Altogether, these results confirm that the qualitative outcome of the model is robust in a wide range of loss of replicability (𝜀 vector) values.

      Author response image 1.

      Replicator coexistence in the constant population model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 2A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Author response image 2.

      Replicator coexistence in the chemostat model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 6A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Similarly, in equation (11), where does the factor 0.8 come from?

      This factor scales the decay rate of duplex sequences (𝑐"!") as the function of the binding energy

      (𝐸b). The value of 0.8 is an arbitrary choice, the value should be in the interval (0,1) and is only relevant in the chemostat model. It is expected to have a similar effect on the dynamics as the duplex decay factor parameter 𝑓, which we have investigated in a wide range of different values (cf. Table 1, Fig. 6), although 𝑓 is independent of the binding energy (𝐸/): increasing/decreasing the 0.8 factor is expected to decrease/increase the average total population size. We have investigated the diversity maintaining ability of the system at smaller (0.6) and larger (0.9) parameter values at different population sizes (𝑁 ≈ 103, 104 and 105) and at different replicability distances (δ = 0.001, 0.005 and 0.01) as shown in Fig. 6. We have found that the number of coexisting master types changes very little in response to changes in this factor. Only two shifts could be detected (underlined): factor 0.9 combined with 𝑁 ≈ 104 and 𝛿 = 0.001 caused the number of surviving master types to decrease by one, while factor 0.9 combined with 𝑁 ≈ 103 and 𝛿 = 0.01 caused the number of surviving master types to increase by one (Author response table 1). Factor 0.6 produced the same number of surviving types as the default (Author response table 1). In summary, the model shows marked robustness to changes in the values of this parameter.

      Author response table 1.

      Number of coexisting master types in the chemostat model with different binding energy dependent duplex decay rates. Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different factor values: 0.6, 0.8 (the original) and 0.9 for comparability.

      Why is the kinetic constant for duplex decay reaction 1.15e10−8?

      Note that this value is the minimum of the duplex decay rate, Table 1 correctly shows the interval of this kinetic constant as: [1.15 ⋅ 10-8, 6.4 ⋅ 10-5]. Both values are derived from the basic parameters of the system and can be computed according to Eq. (11). The minimum: as the parameter set corresponding to this value is: . The maximum: with .

      Are those values related to experiments, or are they chosen because specific behaviors can happen only then?

      See above.

      The choice of the model and parameters potentially impact the two main results, the attenuation of the error threshold and the role of GC content:

      Regarding the error threshold, it is also noted (lines 379-385) that it disappears when back mutations are taken into account. This suggests that overcoming the error threshold might not be as difficult as suggested, and can be achieved in several ways, which calls into question the importance of the particular role of parabolic growth. Besides, when the concentration of replicators is low, product inhibition may be negligible, such that a "parabolic replicator" is effectively growing exponentially and an error catastrophe may occur. Do the authors think that this consideration could affect their conclusion? Can simulations be performed?

      The assumption of back mutation only provides a theoretical solution to the error threshold problem: back mutation guarantees a positive (non-zero) concentration of a master type, but, since the probability of back mutation is generally very low, this equilibrium concentration may be extremely low, or negligible for typical system sizes. Consequently, back mutation alone does not solve the problem of the error catastrophe: in our system back mutation is present (the probability that a sequence with 𝑘 errors mutates back to a master sequence is 𝜇k(1−𝜇)L-k), and the diversity-maintaining ability is limited. The effect of back mutation decreases exponentially with increasing sequence length.

      Regarding the role of the GC content, GC-rich oligomers are found to perform the worst but no rationale is provided.

      For GC-rich oligonucleotides the dissociation probability of a template-copy complex is relatively low (cf. Eqs. (9, 10)), thus they have a relatively low number of offspring, cf. lines 557-561: “a relatively high dissociation probability and the consequential higher propensity of being in a simple stranded form provides an advantage for sequences with relatively low GC content in terms of their replication affinity, that is, the expected number of offspring in case of such variants will be relatively high.”. Note that the simulation results shown in Fig. 3A, demonstrate the realization of this effect with prepared sequences (along a GC content gradient).

      One may assume that it happens because GC-rich sequences are comparatively longer to release the product. However, it is also conceivable that higher GC content may help in the polymerization of the monomers as the monomers attach longer on the template (as described in Eq. (9)). This is an instance where the choice to pull into a single step the association and polymerization reactions are pulled into a single step independent of GC content may be critical.

      It would be important to show that the result arises from the actual physics and not from this modeling choice.

      Some more specific points that would deserve to be addressed:

      • Line 53: it is said that p "reflects how easily the template-reaction product complex dissociates". This statement is not correct. A reaction order p<1 reflects product inhibition, the propensity of templates to bind to each other, not slow product release. Product release can be limiting, yet a reaction order of 1 can be achieved if substrate concentrations are sufficiently high relative to oligomer concentrations (von Kiedrowski et al., 1991).

      We think the key reference is Von Kiedrowski (1993) in this case. Other things being equal, his Table 1 on p. 134 shows that a sufficient increase in 𝐾4, i.e., the stability of the duplex (template and copy) (association rate divided by dissociation rate) throws the system into the parabolic regime. This is what we had in mind. In order to clarify this, we modified the quoted sentence thus: “In this kinetics, the growth order is equal or close to 0.5 (i.e., the dynamics is sub-exponential) because increased stability of the template-copy complex (rate of association divided by dissociation) promotes parabolic growth (von Kiedrowski et al., 1991; von Kiedrowski & Szathmáry, 2001).”

      • Population size is a key parameter, and a comparison is made between small (10^3) and large (10^5) populations, but without explaining what determines the scale (small/large relative to what?).

      The “small” value (103) corresponds to the smallest meaningful population size, significantly smaller population sizes (e.g. 102) cannot maintain the 10 master types (or any subset of them) and are chemically unrealistic. The “large value” (105) is the largest population size for which simulation times are still acceptable, in the case of 106 the runtimes are in the order of months.

      • In the same vein, we might expect size not to be the only important parameter, but also concentration.

      With constant volume population size and concentration are strictly coupled.

      • Lines 543-546: if understanding correctly, the quantitative result is that the error threshold rises from 0.1 in the exponential case to 0.196 in the parabolic. Are the authors suggesting that a factor of 2 is a significant difference?

      In this paragraph we compared the empirical error threshold of our system (which is close to 𝑝"#$ = 0.15) with the error threshold of the well-known single peak fitness landscape (which can be approximated by ) as a reference case. To make the message even clearer we have extended the last sentence (lines 596-597) as follows: “but note that applying this approach to our system is a serious oversimplification”. The 0.196 is simply the probability of error-free replication of a sequence when , but we have removed this sentence (“corresponding to the replication accuracy of a master sequence”) from the manuscript as it seems to be confusing.

      • Figure 3C: this figure shows no statistically significant effect?

      Thank you for pointing out this. We statistically tested the hypothesis that the GC content between the survived and the extinct master subsets are different. This analysis revealed that the differences between these two groups are statistically significant, which we now included in the manuscript at lines 380-390: “A direct investigation of whether the sequence composition of the master types is associated with their survival outcome was conducted using the data from the constant population model simulation results (Figure 2). In these data, the average GC content was measured to be lower in the surviving master subpopulations than in the extinct subpopulations (Figure 3C). To determine whether this difference was statistically significant, nonparametric, two-sample Wilcoxon rank-sum tests (Hollander & Wolfe, 1999) were performed on the GC content of the extinct-surviving master subsets. The GC content was significantly different between these two groups in all nine investigated parameter combinations of population size (N) and replicability distance (δ) at p<0.05 level, indicating a selective advantage for a lower GC content in the constant population model context. The exact p values obtained from this analysis are shown in Figure 3C.”

      • line 542: "phase transition-like species extension (Figure 4B)": such a clear threshold is not apparent.

      Thank you for pointing out the incorrect phrasing. As there is no clear threshold in the number of coexisting types as a function of the mutation rate, we removed the “phase transition-like” expression: “However, when finite population sizes and stochastic effects are taken into account, at the largest investigated per-base mutation rate (𝑝mut = 0.15), the summed relative steady-state master frequencies approach zero (Figure 4C) with accelerating species extinction (Figure 4B), indicating that this value is close to the system׳s empirical error threshold.” (lines 589-594).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On the whole, the work is well done and presented, there are no major recommendations. It seems a good idea to cite and briefly discuss this recent paper: https://pubmed.ncbi.nlm.nih.gov/36996101/ which develops a symbiotic scenario of the coevolution of primordial replicators and reproducers that appears to be fully compatible with the results of the current work.

      Thank you for bringing this article to our attention. We have inserted the following sentence at lines 621-624: “The demonstrated diversity-maintaining mechanism of finite parabolic populations can be used as a plug-in model to investigate the coevolution of naked and encapsulated molecular replicators (e.g., Babajanyan et al., 2023).”

      The manuscript is well written, but there are some minor glitches that merit attention. For example:

      l. 5 "carriers presents a problem, because product formation and mutual hybridization" - "mutual" is superfluous here, delete

      l. 13 "amplification. In addition, sequence effects (GC content) and the strength of resource" - hardly "effects" - should be 'features' or 'properties'

      l. 41 "If enzyme-free replication of oligomer modules with a high degree of sequence" - "modules" here is only confusing - simply, "oligomers"

      l. 44 "under ecological competition conditions with which distinct replicator types with different" - delete "with" etc, there are many such minor glitches that are best corrected.

      Thank you for pointing out, we have corrected! Other drafting errors, glitches, superfluous sentences have also been corrected.

      Reviewer #2 (Recommendations For The Authors):

      None

      Editor (Recommendations For The Authors):

      In the manuscript, it appears that coexistence is assessed at a given point in time, while figures seem to show that it remains time-dependent. It would be great if the authors could clarify this and/or discuss this.

      We appreciate you bringing this to our attention, as we have indeed missed to elaborate on this important point. The steady state characteristic of the coexistence is assessed in our model in the following way: the relative frequency of each master sequence is tested for the condition of ≥ 100- (cut-off relative frequency for survival) in every 2,000th replication step in the interval between 10,000 replication steps before termination and actual termination (10= replication steps). If the above condition is true more than once, we consider the master type in question as survived (we have included this explanation in the Methods section: lines 258-268). Although this relatively narrow time interval can still be regarded as a snapshot of the state of the system, according to our numerical experiences, the resulting measure is a reliable quantitative indicator of the apparent stability of species coexistence in the parabolic dynamics.

    1. Author Response

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we plan to expand the description of the theoretical model to make it more accessible to a broader spectrum of readers. We will discuss in more detail the relation between the different mathematical terms and physical processes at the molecular scale, as well as the experimental evidence supporting the model assumptions. We will also discuss more explicitly how our results are relevant to different systems exhibiting actomyosin nematic bundles beyond stress fibers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We agree with the referee that the manuscript will benefit from a better description of the theoretical model and the results in relation with specific molecular and cellular mechanisms. We will further emphasize how a number of experimental observations in the literature support our model assumptions and can be explained by our results. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured, and in fact estimates in the literature often rely on comparison with hydrodynamic models such as ours. Second, the effective physical properties of actomyosin gels can vary wildly between cells, which may explain the diversity of forms, dynamics and functions. For these reasons, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. In the revised manuscript, we will make this point clearer and will further study the literature to seek quantitative comparison.

      It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin in living cells. To our knowledge, reconstituted actomyosin gels currently lack the ability to sustain the dynamical steady-states involved in the proposed self-organization mechanism, which balance actin flows with turnover. In addition to actomyosin gels in living cells, in vitro systems based on encapsulated cell extracts can also sustain such dynamical steady states [e.g. https://doi.org/10.1038/s41567-018-0413-4], and therefore our theory may be applicable to these systems as well. Of course, with advancements in the field of reconstituted systems, this may change in the near future. We will explicitly discuss this point in the revised manuscript.

      The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree and will avoid the term “sarcomeric”.

      Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We plan to clarify this point by representing in a main figure the stress profiles across dense nematic structures (currently in Supp Fig 2), along with a more detailed description. In short, depending on the parameter regime, the competition between active and viscous stresses in the actin gel determine whether the emergent structures are extensile or contractile. In our system tension is positive in all directions at all times. However, in “contractile” structures, tension is larger along the bundle, whereas in “extensile” structures, tension is larger perpendicular to the bundle. This is consistent with the common expression for active stress of incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8], that takes the form –zQ, where z is positive for an extensile system, showing that in this case active tension is negative along the nematic direction. This point, also been raised by another referee, will be clarified and connected to existing literature.

      Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, e.g. with inflow at the cell edge as a result of polymerization and exclusion at the nucleus. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, as suggested by the referee. We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles, and may not be representative of physiologically relevant situations. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish. We will further emphasize these points in the revised manuscript.

      Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. We note for instance that parameter regimes in which agent-based simulations of actin gels display extended contractile steady-states are non-generic, as these simulations often lead to irreversible clumping (as do many reconstituted contractile systems), see e.g. https://doi.org/10.1038/ncomms10323 or https://doi.org/10.1371/journal.pcbi.1005277. Very few references report sustained actin flows or the organization of a few bundles (https://doi.org/10.1371/journal.pcbi.1009506). While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-ofequilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate our continuum modeling with discrete simulations. In the revised manuscript, we will better frame the relationship between them.

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we will highlight the novelty of the paper in terms of the theoretical model, the mechanism of patterning of dense nematic structures, the nature and dynamics of the resulting architectures, their relation with the experimental record, and the connection with microscopic models.

      We will emphasize the fact that nematic architectures in the actin cytoskeleton are characterized by a co-localization of order and density (and strong variations in each of these fields), that recent work shows that isotropic and nematic organizations coexist and are part of a single heterogeneous network, that the emergence and maintenance of nematic order requires active contraction, and that the assembly and maintenance of dense nematic bundles involves convergent flows. None of these key features can be described by the common incompressible models of active nematics. To address this, we develop here a compressible and density dependent model for an active nematic gel. We will carefully justify that the proposed model is meaningful for actomyosin gels, and we will highlight the commonalities and differences with previous models of active nematics.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that the manuscript requires a better contextualization of the work in relation with the very active field of active nematics. In the revised manuscript, we will clearly describe the relation of our model with existing ones.

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. In the revised manuscript, we will discuss our model in the context of the state-of-the-art, emphasizing connections with existing results.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We will expand the description of the results around Figure 3. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. We will expand the characterization of emergent contractile/extensile networks by describing the distribution of the different components of the stress tensor across the bundles and will place our results in the context of related recent work.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa will not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. In the revised manuscript, we envision to further deepen in this issue in two ways. First, we plan to perform additional agent-based simulations in a regime leading to kappa > 0. Second, we will modify the active gel model such that kappa < 0 for low density/order, so that a fibrillar pattern is assembled, and kappa > 0 for high density/order, so that the emergent fibers are highly contractile.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

    1. Author Response

      We would like to thank the editorial board and the reviewers for their assessment of our manuscript and their constructive feedback that we believe will make our manuscript stronger and clearer. Please find below our provisional response to the public reviews; these responses outline our plan to address the concerns of the reviewers for a planned resubmission. Our responses are written in red.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023. In the revised version, we will change both the introduction and the discussion to reflect the the questionnaires based prognostic accuracy reported in the seminal work by TangaySabourin. We do note here, however, that the latter paper while very novel is unique in showing the power of questionnaires. In addition, the questionnaires we have tested in our cohort did not show any baseline differences suggestive of prognostic accuracy.

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion. In the resubmission, we will acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we will also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site. Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on crossvalidation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. Finally, as discussed by Spisak et al., 1 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship” which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min 18 of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is poor to fair which limits its usefulness for clinical translation. We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on old diffusion data and limited sample size coming from different sites and different acquisition sequences. This by itself would limit the accuracy especially that evidence shows that sample size affect also model performance (i.e. testing AUC)1. In the revision, we will re-word the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were the right ones.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms).

      We apologize for the lack of clarity; we did run tractography and we did not use a predetermined streamlined form of the connectome. We will clarify this point in the methods section.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model2. Such models cannot tell us the features that are important in classifying the groups. Our model is considered a black-box predictive model like neural networks.

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and supplementary Figure 4 are both qualitatively illustrating the shape of the SLF.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery. To address the reviewers concern we will add a supplementary figure showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion. We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim. The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data was pre-registered with a definition of recovery at 20% and is part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off3. Finally, a more recent consensus publication4 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we will therefore add these analyses to the results section of our manuscript upon resubmission

      Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      The number of participants is still low.

      We have discussed this point in our replies to reviewer number 1.

      An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion. While we cannot do a direct study of actual tissue micro-structure, we will explore further the changes observed in the SLF by calculating diffusivity measures and discuss possible explanations of these changes.

      Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues they would like us to address so that we can respond appropriately.

      (1) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (2) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (3) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (4) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      One important question is needed to further clarify the mechanisms of aberrant Ca2+ microwaves as described below.

      Synapsin promoter labels both excitatory pyramidal neurons and inhibitory neurons. To avoid aberrant Ca2+ microwave, a combination of Flex virus and CaMKII-Cre or Thy-1-GCaMP6s and 6f mice were tested. However, all these approaches limit the number of infected pyramidal neurons. While the comprehensive display of these results is appreciated, a crucial question remains unanswered. To distinguish whether the microwave of Ca2+ is caused selectively via the abnormality of interneurons, or just a matter of pyramidal neuron density, testing Flex-GCaMP6 in interneuron specific mouse lines such as PV-Cre and SOM-Cre will be critical.

      We agree that unravelling the role of interneurons is important to the understanding of the cellular mechanisms. However, the primary goal of this preprint was to alert the field and those embarking on in vivo Ca2+ imaging to AAV transduction induced artefacts mediated by one of the most widely used viral constructs for Ca2+ imaging in the field. It was important to us to distribute this finding among the community in a timely manner to avoid the unnecessary waste of resources.

      We consider a thorough understanding of cell-type specific mechanisms interesting. However, the biological relevance of the Ca2+ waves is as yet unclear and to disentangle exactly which cellular and subcellular factors that drive the aberrant phenomenon will require a large systematic effort which goes beyond our resources. For instance, it will be technically not trivial to separate biologically relevant contributions from technical differences. For instance, the absence of Ca2+ waves under the principal neuron promotor CaMKII may suggest the involvement of interneurons. However, alternate possibilities are a reduced density of expression across principal neurons or that the expression levels between the 2 promoters is different.

      The important, take-home message of the preprint, in our opinion, is that users check carefully their viral protocols, adjust the protocols for their specific scientific question and report any issues. We now emphasise the fact that although Ca2+ waves were not observed following conditional expression of syn.GCaMP with CaMKII.cre, this may not be due to a requirement for interneuronal expression but simply reflect differences in final GCaMP expression density and levels between the two transduction procedures (P12, L298-303).

      Reviewer #2 (Public Review):

      Weaknesses:

      Whether micro-waves are associated with the age of mice was not quantified. This would be good to know and the authors do have this data.

      We plotted the animal age at the time of injection for all injections of Syn.GCaMP6 into CA1/CA3 and found no correlation in either the occurrence of Ca2+ waves nor the frequency of Ca2+ waves during the age period between 5 – 79 wks (see reviewer Fig1; linear regression fit to the Ca2+ wave frequency against age was not significant: intercept = 1.37, slope = -0.007, p=0.62, n = 14; and generalized linear model relating Ca2+ wave ~ age was not significant: z score = 0.19, deviance above null = 0.04, p = 0.85, n=24). We have now added a statement to this in the revised manuscript (P14 L354-359) and for the reviewers we have added the plots below.

      Author response image 1.

      Plot of Ca2+ micro-wave frequency (left: number of Ca2+ waves/min) or occurrence (right: yes/no) against the animal age at the time of viral injection. Blue line is linear (left) or logistic (right) fit to the data with 95% confidence level.

      The effect of micro-waves on single cell function was not analyzed. It would be useful, for example, if we knew the influence of micro-waves on place fields. Can a place cell still express a place field in a hippocampus that produces micro-waves? What effect might a microwave passing over a cell have on its place field? Mice were not trained in these experiments, so the authors do not have the data.

      We agree that these are interesting questions; however, the preprint is focused on describing the GECI expression conditions prone to generating these artefacts. Studying the effects of Ca2+ micro-waves on the circuitry are scientific questions, and would require an experimental framework of testing the aberrant activity on a specific physiological function e.g. place activity or specific oscillations (e.g. sharp-wave activity). Ca2+ microwaves, as the ones described here, have not been reported under physiological conditions or pathophysiological conditions and studying the effects of such artefactual waves on the circuit was not our intention.

      With respect to place cell activity, specifically, it is intuitive that during the Ca2+ micro-wave the participating cell’s place field activity would be obscured by the artefactual activity. Cell activity appears to return immediately following the wave suggesting that the cells could exhibit place activity outside their participation in the Ca2+ micro-waves. However, we do not know if the Ca2+ micro-wave activity disrupts the generation or maintenance of place fields. We have now added a brief reference to possible effects on place coding to the paper (P12, L315-317).

      The CaMKII-Cre approach for flexed-syn-GCaMP expression shows no micro-waves and is convincing, but it is only from 2 animals, even though both had no micro-waves. In light of the reviewer’s comment, we have added a further 3 animals with conditional expression of GCaMP6m from the DZNE to complement the current dataset with conditional expression of GCaMP6s from UoB (P10, L236 & 239 and revised table 1). Although Ca2+ waves were not observed in any of the in total 5 animals, we still do not know with all certainty whether this approach is completely safe. Time will show if researchers still encounter the phenotype under certain conditions when using this conditional approach.

      The authors state in their Discussion that even without observable microwaves, a syn-Ca2+-indicator transduction strategy could still be problematic. This may be true, but they do not check this in their analysis, so it remains unknown

      We agree with the reviewer and have now made this point clearer in the revised discussion (P11, L257-258)

      Reviewer #3 (Public Review):

      Weaknesses:

      I believe that the weaknesses of the manuscript are appropriately highlighted by the authors themselves in the discussion. I would, however, like to emphasize several additional points.

      As the authors state, the exact conditions that lead to Ca2+ micro-waves are unclear from this manuscript. It is also unclear if Ca2+ micro-waves are specific to GECI expression or if high-titer viral transduction of other proteins such as genetically encoded voltage indicators, static fluorescent proteins, recombinases, etc could also cause Ca2+ micro-waves.

      The high expression of other proteins has been shown to result in artefactual phenomenon such as toxicity or fluorescent puncta (for GFP see Hechler et al. 2006; Katayama et al. 2008 for GEVI see Rühl et al. 2021), but we are not aware of reports of micro-waves. Although it is certainly possible that high expression levels of other proteins could lead to waves, we suspect the Ca2+ micro-waves observed in this preprint result from a dysregulation of Ca2+ homeostasis. This is not to suggest that voltage indicators could not result in micro-waves (e.g. Ca2+ homeostasis may be indirectly affected).

      The authors almost exclusively tested high titer (>5x10^12 vg/mL) large volume (500-1000 nL) injections using the synapsin promoter and AAV1 serotypes. It is possible that Ca2+ micro-waves are dramatically less frequent when titers are lowered further but still kept high enough to be useful for in vivo imaging (e.g. 1x10^12 vg/mL) or smaller injection volumes are used. It is also possible that Ca2+ micro-waves occur with high titer injections using other viral promoter sequences such as EF1α or CaMKIIα. There may additionally be effects of viral serotype on micro-wave occurrence.

      We agree with all points raised by the reviewer. Notably, we used viral transduction protocols with titers and volumes within in the range of those previously used for viral transduction of GCaMP under the synapsin promoter (see P11 L269-275) and we observed Ca2+ micro-waves. As the reviewer suggested, we did find that lowering the titer is an important factor in reducing these Ca2+ micro-waves and there is likely a wide range of approaches that avoid the phenomenon. With regards to viral serotype, we show that micro-waves occurred across AAV1 and 9, but it is possible that other serotypes may avoid the phenomenon.

      We reiterate in the abstract of the revised manuscript that expression level is a crucial factor (P2, L40 and P2, L44-45) and now mention that other promoters and induction protocols that result in high Ca2+ indicator expression may result in Ca2+ micro-waves (P12, L291-294.

      The number of animals in any particular condition are fairly low (Table 1) with the exception of V1 imaging and thy1-GCaMP6 imaging. This prohibits rigorous comparison of the frequency of pathological calcium activity across conditions.

      We have now added 3 more animals with conditional GCaMP6 expression. In total, the study contains 34 animals with viral injection into the hippocampus from different laboratories and under different conditions resulting in multiple groups. As such we are cognizant of the resulting limitations for statistical evaluation.

      However, in light of the reviewer’s comment, we have now employed a generalized linear model tested on all the data to examine the relationship between the Ca2+ micro-wave incidence and the different factors. The multivariate GLM did find a significant relationship between Ca2+ micro-wave incidence and both viral dilution and weeks post injection (see below and revised manuscript P8, L189-193).

      For injections into CA1 in the hippocampus (n=28), a GLM found no relationship between Ca2+ micro-waves and each of the individual variables x (Ca-wave ~ x) ; viral dilution: z score = 1.14, deviance above null = 1.31, p = 0.254; post injection weeks: : z score = 1.18, deviance above null = 1.44, p = 0.239; injection volume: : z score = -0.76, deviance above null = 0.59, p = 0.45; construct: : z score = 1.18, difference in deviance above null = 1.44, p = 0.239)

      However, a multivariable logistic GLM relating dilution and post injection weeks (Ca-wave ~ dilution + p.i_wks) showed that together both variables were significantly related to Ca2+ micro-waves (Deviation above null = 7.5; Dilution: z score = 2.18, p < 0.05; p.i_wks : z score = 2.22, p < 0.05).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Results are straightforward and convincing. While a couple of ways to reduce the aberrant microwaves of calcium responses were demonstrated, delving into the functions of interneurons is crucial for a more comprehensive understanding of cellular causality.

      As mentioned in the public response, disentangling cellular mechanism from technical requirements will need a large and systematic study. To determine the contribution from interneurons, the use of specific interneuron promoters would be required, and viral titers systematically varied to result in similar cellular GCaMP expression levels as seen under the synapsin promoter condition.

      Reviewer #2 (Recommendations For The Authors):

      Do the authors think the cells are firing when they participate in a micro-wave, or do they think the calcium influx is due to something else? A discussion point on this would be good.

      This is an excellent point raised by the reviewer. We do not know if the elevated cellular Ca2+ during the artifactual Ca2+ micro-wave reflects action potential firing or an increase of Ca2+ from intracellular stores. As already described in the text of the preprint, their optical spatiotemporal profile neither fits with known microseizure progression patterns, nor with spreading depolarization/depression. We have adopted the reviewer’s suggestion and added the following point to the discussion section in the revised preprint (P12, L308-315):

      In a limited dataset, we attempted to detect the Ca2+ micro-waves by hippocampal LFP recordings (using a conventional insulated Tungsten wire, diameter ~110µm). We could not identify a specific signature, e.g. ictal activity or LFP depression, which may correspond to these Ca2+ micro-waves. The crucial shortcoming of this experiment of course is that with these LFP recordings, we could not simultaneous perform hippocampal 2-photon microscopy. Thus, it is uncertain if the Ca2+ micro-waves indeed occurred in proximity to our electrode.

      The results seem to suggest that micro-waves may involve interneurons as their CaMKII-Cre strategy avoids waves - possibly due to a lack of expression of GECIs in interneurons. It would be great to hear the author's thoughts on this and add a brief discussion point.

      As mentioned in public response to Reviewer 1, it is difficult to disentangle cellular mechanisms from technical requirements, and the exact requirements for the Ca2+ micro-waves to occur are still not fully clear. The absence of Ca2+ micro-waves in our CaMKII-Cre dataset may indeed reflect the requirement of interneurons. However, it could just as well be due to a sparse labelling of principle cells or simply reflect differences in the expression levels of GCaMP under the different promotors.

      All in all, a more complete understanding of the requirements of such Ca2+ micro-waves will require a community effort. Therefore, it is important that each group check the safety profile of their GECI and report problems to the community.

      We have added these points to the revised preprint (P12, L291 and P12, L298)

      Plotting the incidence of micro-waves as a function of the age of mice would be a nice addition (the authors have the data).

      There was no relationship of Ca2+ micro-wave occurrence or frequency with age over the range of 5-79 wks (see public response) and this has been added to the preprint (P14, L354)

      Reviewer #3 (Recommendations For The Authors):

      I appreciate the authors raising the awareness of this issue. I had personally observed micro-waves in my own data as well. In agreement with their findings, I found that the occurrence of micro-waves was dramatically lower when I reduced the viral titer. Anecdotally, I also observed voltage micro-waves when virally transducing genetically encoded voltage indicators at similar titers. For that reason, I am skeptical that this issue is exclusive to GECIs.

      We find it interesting that the reviewer has also seen artefactual micro-waves following viral transduction of genetically encoded voltage indicators. Without seeing the voltage waves the referee is referring to or the conditions, it is of course difficult to compare with the Ca2+ micro-waves we report. However, this comment again raises the question of mechanism. We believe that in the GECI framework, Ca2+ homeostatic aspects are important. Voltage indicators are based on different sensor mechanisms, and expressed in the cell membrane, but it may very well be that there are overlapping factors between Ca2+ and voltage indicators that could trigger a similar, or even the same phenomenon in the end.

      Minor comments:

      (1) Line 131-132: I believe the authors only tested for micro-waves in V1. This should be made clear in the results. It could be that micro-waves could occur in other parts of cortex with the same viral titers.

      Both V1 and somatosensory cortex were tested as described in the methods (P15, L395-397), we have made this clearer in the revised preprint (P6, L138).

      (2) There are no statistics associated with the data from Fig 1e.

      We have now added statistics (P5, L126).

      (3) The authors may be able to make a stronger claim about the pathological nature of the micro-waves if there are differences in the histology between the injected and non-injected hemispheres. For example, is there evidence of widespread cell death in the injected hemisphere (e.g. lower cell count, smaller hippocampal volume, caspase staining, etc).

      We found no evidence of gross morphological changes to the hippocampus following viral transduction with no changes in CA1 pyramidal cell layer thickness or CA1 thickness (pyramidal cell layer thickness: 49 ± 12.5 µm ipsilateral and 50.3 ± 11.1 µm contralateral, n=4, Student’s t-test p=0.89; CA1 thickness: 553.3 ± 14 µm ipsilateral and 555.8 ± 62 µm contralateral, n = 4, Student’s t-test p=0.94; 48 ± 13 weeks post injection at time of perfusion).

      We have added this to the preprint (P5, L117-122)

      (4) The broader micro-waves in the stratum oriens versus the stratum pyramidale are likely due to the spread of the basal dendrites of pyramidal cells. If the typical size of the basal dendritic arbor of CA1 pyramidal neurons is taken into account, does this explain the wider calcium waves in this layer.

      Absolutely, great point, yes, we completely agree on this. It is likely the active neuropil (including dendritic arbour) are contributing to the apparent broader diameter. In addition, as evident in the video 5 cell somata in the stratum Oriens (possibly interneurons) are active and their processes also contribute.

      We have now mentioned these points in the revised preprint (P5, L132)

      (5) Lines 179-181: Is the difference in the prevalence of micro-waves between viral titers statistically significant?

      Although we have a large number of animals in total (n=34) with viral injection into the hippocampus, the number of animals in each condition, given the many factors, is low. We therefore used a generalized linear model to test the relationship between the Ca2+ micro-waves and the variables.

      We have now added this analysis to the revised preprint (P8, L189-193)

      (6) Lines 200-203: The CA3 micro-waves were only observed at one institution. The current wording is slightly misleading.

      We agree and have changed this to be clearer (P9 L216)

    1. Author Response

      eLife assessment

      The authors used electrophysiology in brain slices and computer modeling and suggest that layer 2/3 pyramidal neurons of the mouse cortex express functional HCN channels, despite little evidence in the past that they are present. The study is useful at the present time, but results are incomplete because the methods, data, and analyses do not always support the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Oleh et al. uses in vitro electrophysiology and compartmental modeling (via NEURON) to investigate the expression and function of HCN channels in mouse L2/3 pyramidal neurons. The authors conclude that L2/3 neurons have developmentally regulated HCN channels, the activation of which can be observed when subjected to large hyperpolarizations. They further conclude via blockade experiments that HCN channels in L2/3 neurons influence cellular excitability and pathway-specific EPSP kinetics, which can be neuromodulated. While the authors perform a wide range of slice physiology experiments, concrete evidence that L2/3 cells express functionally relevant HCN channels is limited. There are serious experimental design caveats and confounds that make drawing strong conclusions from the data difficult. Furthermore, the significance of the findings is generally unclear, given modest effect sizes and a lack of any functional relevance, either directly via in vivo experiments or indirectly via strong HCN-mediated changes in known operations/computations/functions of L2/3 neurons.

      Specific points:

      (1) The interpretability and impact of this manuscript are limited due to numerous methodological issues in experimental design, data collection, and analysis. The authors have not followed best practices in the field, and as such, much of the data is ambiguous and/or weak and does not support their interpretations (detailed below). Additionally, the authors fail to appropriately explain their rationale for many of their choices, making it difficult to understand why they did what they did. Furthermore, many important references appear to be missing, both in terms of contextualizing the work and in terms of approach/method. For example, the authors do not cite Kalmbach et al 2018, which performed a directly comparable set of experiments on HCN channels in L2/3 neurons of both humans and mice. This is an unacceptable omission. Additionally, the authors fail to cite prior literature regarding the specificity or lack thereof of Cs+ in blocking HCN. In describing a result, the authors state "In line with previous reports, we found that L2/3 PCs exhibited an unremarkable amount of sag at 'typical' current commands" but they then fail to cite the previous reports.

      We thank the reviewer for the thorough examination of our manuscript; however, we strongly disagree with many of the raised concerns for several reasons, as detailed in an initial response below:

      To address the lack of certain citations, we would like to emphasize that in the introduction section, we did focus on a several decades-long line of investigation into the HCN channel content of layer 2/3 pyramidal cells (L2/3 PCs), where there has undoubtedly been some controversy as to their functional contribution. We did not explicitly cite papers that claimed to find no/little HCN channels/sag- although this would be a significant list of pubs from some excellent senior investigators, as we wanted to avoid shining a negative light on otherwise excellent publications. However, we plan to address this more clearly in the upcoming revision.

      Just to take an example: in the publication mentioned by the reviewer (Kalmbach et al 2018), the investigators did not carry out voltage clamp recordings. Furthermore, the reported input resistance values in the aforementioned paper were far above other reports in mice (Routh et al. 2022, Brandalise et al 2022, Hedrick et al 2012; which were similar and our findings here), suggesting that recordings in Kalmbach were carried out at membrane potentials where HCN activation is less available (Routh, Brager and Johnston 2022).

      Another reason for some mixed findings in the field is undoubtedly due to the small/nonexistent sag in L2/3 current clamp recordings in mice. We also found a small sag, and that we have shown to be explained by the following: The ‘sag’ potential is a biphasic voltage response emerging from a relatively fast passive membrane response and a slower Ih activation. In L2/3 PCs, hyperpolarization-activated currents are apparently faster than previously described and are located proximally (our findings here). Therefore, their recruitment in mouse L2/3 PCs is on a similar timescale as the passive membrane response, resulting in a more monophasic response. Again, we plan to include a full set of citations in the updated introduction section, to highlight the importance of HCN channels in L2/3 PCs in mice and other species. The justification for using cesium (i.e., ‘best practices’) is detailed in the next paragraph.

      (2) A critical experimental concern in the manuscript is the reliance on cesium, a nonspecific blocker, to evaluate HCN channel function. Cesium blocks HCN channels but also acts at potassium channels (and possibly other channels as well). The authors do not acknowledge this or attempt to justify their use of Cs+ and do not cite prior work on this subject. They do not show control experiments demonstrating that the application of Cs+ in their preparation only affects Ih. Additionally, the authors write 1 mM cesium in the text but appear to use 2 mM in the figures. In later experiments, the authors switch to ZD7288, a more commonly used and generally accepted more specific blocker of HCN channels. However, they use a very high concentration, which is also known to produce off-target effects (see Chevaleyre and Castillo, 2002). To make robust conclusions, the authors should have used both blockers (at accepted/conservative concentrations) for all (or at least most) experiments. Using one blocker for some experiments and then another for different experiments is fraught with potential confounds.

      To address the concerns regarding the usage of cesium to block HCN channels, we would like to state that neither cesium nor ZD-7288 are without off-target effects, however in our case the potential off-target effects of external cesium were deemed less impactful, especially concerning AP firing output experiments. Extracellular cesium has been widely accepted as a blocker of HCN channels (Lau et al. 2010, Wickenden et al. 2009, Rateau and Ropert 2005, Hemond et al. 2009, Yang et al. 2015, Matt et al. 2010). However, it is known to act on potassium channels as well, which has mostly been demonstrated with intracellular application (Puil et al. 1981, Fleidervish et al. 2008, Williams et al. 1991, 2008). However, we acknowledge off-target effects and we will better cite the appropriate literature in our manuscript in the revision.

      Although we performed internal control experiments during the recordings, these were not included in the manuscript- which we plan to correct in the revision. These are detailed as follows: during our recordings cesium had no significant effect on action potential halfwidth, ruling out substantial blocking of potassium channels, nor did it affect any other aspects of suprathreshold activity. Furthermore, we observed similar effects on passive properties (resting membrane potential, input resistance) following ZD-7288 as with cesium, which we will also update in our figures. We did acknowledge that ZD-7288 is a widely accepted blocker of HCN, and for this reason we carried out some of our experiments using this pharmacological agent instead of cesium. However, these experiments were always supported by complementary findings using external cesium. For example, the effect of ZD-7288 on EPSPs was confirmed by similar synaptic stimulation experiments using cesium. This is important, as synaptic inputs of L2/3 PCs are modulated by both dendritic sodium (Ferrarese et al. 2018) and calcium channels (Landau 2022), therefore the application of ZD-7288 alone may have been difficult to interpret in isolation.

      On the other hand, ZD-7288 suffers from its own side effects, such as a substantial effect on sodium channels (Wu et al. 2012) and calcium channels (Sánchez-Alonso et al. 2008, Felix et al. 2003). As our aim was to provide functional evidence for the importance of HCN channels, we deemed these effects unacceptable in experiments where AP firing output (e.g., in cell-attached experiments) was measured.

      (3) A stronger case could be made that HCN is expressed in the somatic compartment of L2/3 cells if the authors had directly measured HCN-isolated currents with outside-out or nucleated patch recording (with appropriate leak subtraction and pharmacology). Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work. It has been shown to produce erroneous results over and over again in the field due to well-known space clamp problems (see Rall, Spruston, Williams, etc.). The authors could have also included negative controls, such as recordings in neurons that do not express HCN or in HCN-knockout animals. Without these experiments, the authors draw a false equivalency between the effects of cesium and HCN channels, when the outcomes they describe could be driven simply by multiple other cesium-sensitive currents. Distortions are common in these preparations when attempting to study channels (see Williams and Womzy, J Neuro, 2011). In Fig 2h, cesium-sensitive currents look too large and fast to be from HCN currents alone given what the authors have shown in their earlier current clamp data. Furthermore, serious errors in leak subtraction appear to be visible in Supplementary Figure 1c. To claim that these conductances are solely from HCN may be misleading.

      We disagree with the argument that “Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work”. Although this method is not without its confounds (i.e. space clamp), it is still a useful initial measure as demonstrated countless times in the literature. However, the reviewer is correct that the best approach to establish the somatodendritic distribution of ion channels is by direct somatic and dendritic outside-out patches. Due to the small diameter of L2/3 PC dendrites, these experiments haven’t been carried out yet in the literature for any other ion channel either to our knowledge. Mapping this distribution may be outside the scope of the current manuscript, but it was hard for us to ignore the sheer size of the Cs+ sensitive hyperpolarizing currents in whole cell. Thus, we will opt to report this data.

      Also, we should point out that space clamp-related errors manifest in the overestimation of frequency-dependent features, such as activation kinetics, and underestimation of steady-state current amplitudes. The activation time constant of our measured currents are somewhat faster than previously reported- reducing major concerns regarding space clamp errors. Furthermore, we simply do not understand what “too large… to be from HCN currents” means. We would like to ask the reviewer to point out what the “serious errors in leak subtraction” are, as the measured currents are similar in shape and correction artifacts to previously reported HCN currents (Meng et al. 2011, Li 2011, Zhao et al. 2019, Yu et al. 2004, Zhang et al. 2008, Spinelli et al. 2018, Craven et al. 2006, Ying et al. 2012, Biel et al. 2009).

      Furthermore, we would be grateful if the reviewer would mention the other possible ion channels that are activated at hyperpolarized voltages, have the same voltage dependence as HCN currents, do not show inactivation, influence both input resistance and resting membrane potential, and are blocked by low concentration extracellular cesium.

      (4) The authors present current-clamp traces with some sag, a primary indicator of HCN conductance, in Figure 2. However, they do not show example traces with cesium or ZD7288 blockade. Additionally, the normalization of current injected by cellular capacitance and the lack of reporting of input resistance or estimated cellular size makes it difficult to determine how much current is actually needed to observe the sag, which is important for assessing the functional relevance of these channels. The sag ratio in controls also varies significantly without explanation (Figure 6 vs Figure 7). Could this variability be a result of genetically defined subgroups within L2/3? For example, in humans, HCN expression in L2/3 varies from superficial and deep neurons. The authors do not make an effort to investigate this. Regardless of inconsistencies in either current injection or cell type, the sag ratio appears to be rather modest and similar to what has already been reported previously in other papers.

      We thank the reviewer for pointing out that our explanation for the modest sag ratio might have not been sufficient to properly understand why this measurement cannot be applied to layer 2/3 pyramidal cells. We will clarify this section in the results section. Briefly: sag potential emerges from a relatively (compared to Ih) fast passive membrane response and a slower HCN recruitment. The opposing polarity and different timescales of these two mechanisms results in a biphasic response called “sag” potential. However, if the timescale of these two mechanisms is similar, the voltage response is not predicted to be biphasic. We have shown that hyperpolarization activated currents in our preparations are fast and proximal, therefore they are recruited during the passive response (see Figure 2g.). This means that although a substantial amount of HCN currents are activated during hyperpolarization, their activation will not result in substantial sag. Therefore, sag ratio measurement is not necessarily applicable to approximate the HCN content of L2/3 PCs. We would like to emphasize that sag ratio measurements are correct in case of other cell types, and our aim is not to discredit the method, but rather to show that it cannot be applied in case of mouse L2/3 PCs.

      Our own measurements, similar to others in the literature show that L2/3 PCs exhibit modest sag ratios, however, this does not mean that HCN is not relevant. Ih activation in L2/3 PCs does not manifest in large sag potential but rather in a continuous distortion of steady-state responses (Figure 2b.). The reviewer is correct that L2/3 PCs are non-homogenous, therefore we sampled along the entire L2/3 axis. This yielded some variability in our results (i.e., passive properties); yet we did not observe any cells where hyperpolarizing-activated/Cs+-sensitive currents could not be resolved. As structural variability of L2/3 cells does result in variability in cellular capacitance, we compensated for this variability by injecting cellular capacitance-normalized currents. Our measured cellular capacitances were in accordance with previously published values, in the range of 50-120 pF. Therefore, the injected currents were not outside frequently used values. Together, we would like to state that whether substantial sag potential is present or not, initial estimates of the HCN content for each L2/3 PC should be treated with caution.

      (5) In the later experiments with ZD7288, the authors measured EPSP half-width at greater distances from the soma. However, they use minimal stimulation to evoke EPSPs at increasingly far distances from the soma. Without controlling for amplitude, the authors cannot easily distinguish between attenuation and spread from dendritic filtering and additional activation and spread from HCN blockade. At a minimum, the authors should share the variability of EPSP amplitude versus the change in EPSP half-width and/or stimulation amplitudes by distance. In general, this kind of experiment yields much clearer results if a more precise local activation of synapses is used, such as dendritic current injection, glutamate uncaging, sucrose puff, or glutamate iontophoresis. There are recording quality concerns here as well: the cell pictured in Figure 3a does not have visible dendritic spines, and a substantial amount of membrane is visible in the recording pipette. These concerns also apply to the similar developmental experiment in 6f-h, where EPSP amplitude is not controlled, and therefore, attenuation and spread by distance cannot be effectively measured. The outcome, that L2/3 cells have dendritic properties that violate cable theory, seems implausible and is more likely a result of variable amplitude by proximity.

      To resolve this issue, we will make a supplementary figure showing elicited amplitudes, which showed no significant distance dependence and minimal variability. We thank the reviewer for suggesting an amplitude-halfwidth comparison control. To address the issue of the non-visible spines, we would like to note that these images are of lower magnification. The presence of dendritic spines was confirmed in every recorded pyramidal cell observed using 2P microscopy.

      We would like to emphasize that although our recordings “seemingly” violated the cable theory, this is only true if we assume a completely passive condition. As shown in our manuscript, cable theory was not violated, as the presence of NMDA receptor boosting explained the observed ‘non-Rallian’ phenomenon. We plan to clarify this in the fully revised manuscript.

      (6) Minimal stimulation used for experiments in Figures 3d-i and Figures 4g-h does not resolve the half-width measurement's sensitivity to dendritic filtering, nor does cesium blockade preclude only HCN channel involvement. Example traces should be shown for all conditions in 3h; the example traces shown here do not appear to even be from the same cell. These experiments should be paired (with and without cesium/ZD). The same problem appears in Figure 4, where it is not clear that the authors performed controls and drug conditions on the same cells. 4g also lacks a scale bar, so readers cannot determine how much these measurements are affected by filtering and evoked amplitude variability. Finally, if we are to believe that minimal stimulation is used to evoke responses of single axons with 50% fail rates, NMDA receptor activation should be minimal to begin with. If the authors wish to make this claim, they need to do more precise activation of NMDA-mediated EPSPs and examine the effects of ZD7288 on these responses in the same cell. As the data is presented, it is not possible to draw the conclusion that HCN boosts NMDA-mediated responses in L2/3 neurons.

      As stated in the figure legends, the control and drug application traces are from the same cell, both in figure 3 and figure 4, and the scalebar is not included as the amplitudes were normalized for clarity. We have address the effects of dendritic filtering above in answer (5), and cesium blockade above in answer (2). To reiterate, dendritic filtering alone cannot explain our observations, and cesium is often a better choice for blocking HCN channels compared to ZD-7288, which blocks sodium channels as well. When an excitatory synaptic signal arrives onto a pyramidal cell in typical conditions, neurotransmitter sensitive receptors transmit a synaptic current to the dendritic spine. This dendritic spine is electrically isolated by the high resistance of the spine neck and due to the small membrane surface of the spine, the synaptic current elicits remarkably large voltage changes. These voltage changes can be large enough to depolarize the spine close to zero millivolts upon even single small inputs (Jayant et al. 2016). Therefore, to state that single inputs arriving to dendritic spines cannot be large enough to recruit NMDA receptor activation is incorrect. This is further exemplified by the substantial literature showing ‘miniature’ NMDA recruitment via stochastic vesicle release alone.

      (7) The quality of recordings included in the dataset has concerning variability: for example, resting membrane potentials vary by >15-20 mV and the AP threshold varies by 20 mV in controls. This is indicative of either a very wide range of genetically distinct cell types that the authors are ignoring or the inclusion of cells that are either unhealthy or have bad seals.

      Although we are aware of the diversity of L2/3 PCs, resolving further layer depth differences is outside the scope of our current manuscript. However, as shown in Kalmbech et al, resting membrane potential can greatly vary (>15-20 mV) in L2/3 PCs depending on distance from pia. We acknowledge that the variance in AP threshold is large and could be due to genetically distinct cell types. Therefore, we plan to present AP peak/width information in the revision, which showed a significantly smaller variability, therefore validating our recording conditions.

      (8) The authors make no mention of blocking GABAergic signaling, so it must be assumed that it is intact for all experiments. Electrical stimulation can therefore evoke a mixture of excitatory and inhibitory responses, which may well synapse at very different locations, adding to interpretability and variability concerns.

      We thank the reviewer for pointing out our lack of detail regarding the GABAergic signaling blocker SR 95531. We did include this drug in our recordings of signal summation, so GABAergic responses did not contaminate our recordings. We plan to clarify in the revision.

      (9) The investigation of serotonergic interaction with HCN channels produces modest effect sizes and suffers the same problems as described above.

      We do not agree with the reviewer that 50% drop in neuronal AP firing responses (Figure 7b) was a modest effect size. Thus we plan to keep this data in the manuscript.

      (10) The computational modeling is not well described and is not biologically plausible. Persistent and transient K channels are missing. Values for other parameters are not listed. The model does not seem to follow cable theory, which, as described above, is not only implausible but is also not supported by the experimental findings.

      The model was downloaded from the Cell Type Database from the Allen Institute, with only minor modifications including the addition of dendritic HCN channels and NDMA receptors- which were varied along a wide parameter space to find a ‘best fit’ to our observations. These additions were necessary to recapitulate our experimental findings. We agree the model likely does not fully recapitulate all aspects of the dendrites, which as we hope to convey in this manuscript, are not fully resolved in mouse L2/3 PCs. This is a published neuronal model, and despite its potential shortcomings, is one among a handful of open-source neuronal models of fully reconstructed L2/3 PCs. We are open to improvement suggestions.

      Reviewer #2 (Public Review):

      Summary:

      This paper by Olah et al. uncovers a previously unknown role of HCN channels in shaping synaptic inputs to L2/3 cortical neurons. The authors demonstrate using slice electrophysiology and computational modeling that, unlike layer 5 pyramidal neurons, L2/3 neurons have an enrichment of HCN channels in the proximal dendrites. This location provides a locus of neuromodulation for inputs onto the proximal dendrites from L4 without an influence on distal inputs from L1. The authors use pharmacology to demonstrate the effect of HCN channels on NMDA-mediated synaptic inputs from L4. The authors further demonstrate the developmental time course of HCN function in L2/3 pyramidal neurons. Taken together, this a well-constructed investigation of HCN channel function and the consequences of these channels on synaptic integration in L2/3 pyramidal neurons.

      Strengths:

      The authors use careful, well-constrained experiments using multiple pharmacological agents to asses HCN channel contributions to synaptic integrations. The authors also use a voltage clamp to directly measure the current through HCN channels across developmental ages. The authors also provide supplemental data showing that their observation is consistent across multiple areas of the cerebral cortex.

      Weaknesses:

      The gradient of the HCN channel function is based almost exclusively on changes in EPSP width measured at the soma. While providing strong evidence for the presence of HCN current in L2/3 neurons, there are space clamp issues related to the use of somatic whole-cell voltage clamps that should be considered in the discussion.

      We thank the reviewer for pointing out our careful and well-constrained experiments and for making suggestions. The potential effects of space clamp errors will be detailed in the discussion section (see extended explanations under Reviewer 1).

      Reviewer #3 (Public Review):

      Summary:

      The authors study the function of HCN channels in L2/3 pyramidal neurons, employing somatic whole-cell recordings in acute slices of visual cortex in adult mice and a bevy of technically challenging techniques. Their primary claim is a non-uniform HCN distribution across the dendritic arbor with a greater density closer to the soma (roughly opposite of the gradient found in L5 PT-type neurons). The second major claim is that multiple sources of long-range excitatory input (cortical and thalamic) are differentially affected by the HCN distribution. They further describe an interesting interplay of NMDAR and HCN, serotonergic modulation of HCN, and compare HCN-related properties at 1, 2 and 6 weeks of age. Several results are supported by biophysical simulations.

      Strengths:

      The authors collected data from both male and female mice, at an age (6-10 weeks) that permits comparison with in vivo studies, in sufficient numbers for each condition, and they collected a good number of data points for almost all figure panels. This is all the more positive, considering the demanding nature of multi-electrode recording configurations and pipette-perfusion. The main strength of the study is the question and focus.

      Weaknesses:

      Unfortunately, in its present form, the main claims are not adequately supported by the experimental evidence: primarily because the evidence is indirect and circumstantial, but also because multiple unusual experimental choices (along with poor presentation of results) undermine the reader's confidence. Additionally, the authors overstate the novelty of certain results and fail to cite important related publications. Some of these weaknesses can be addressed by improved analysis and statistics, resolving inconsistent data across figures, reorganizing/improving figure panels, more complete methods, improved citations, and proofreading. In particular, given the emphasis on EPSPs, the primary data (for example EPSPs, overlaid conditions) should be shown much more.

      However, on the experimental side, addressing the reviewer's concerns would require a very substantial additional effort: direct measurement of HCN density at different points in the dendritic arbor and soma; the internal solution chosen here (K-gluconate) is reported to inhibit HCN; bath-applied cesium at the concentrations used blocks multiple potassium channels, i.e. is not selective for HCN (the fact that the more selective blocker ZD7288 was used in a subset of experiments makes the choice of Cs+ as the primary blocker all the more curious); pathway-specific synaptic stimulation, for example via optogenetic activation of specific long-range inputs, to complement / support / verify the layer-specific electrical stimulation.

      We thank the reviewer for their very careful examination of our manuscript and helpful suggestions. We will address the concerns raised in the review and present substantially more raw traces in our figures. Although direct dendritic HCN mapping measurements are likely outside the scope of the current manuscript due to the morphological constraints presented by L2/3 PCs (which explains why no other full dendritic nonlinearity distribution has been described in L2/3 PCs with this method), we will nonetheless supplement our manuscript with additional suggested experiments. For example we plan to include the excellent suggestion of pathway-specific optogenetic stimulation to further validate the disparate effect of HCN channels for distal and proximal inputs. We will also include control measurements using different internal solutions. We agree that ZD-7288 is a widely accepted blocker of HCN channels. However, the off-target effects on sodium channels may have significantly confounded our measurements of AP output using extracellular stimulation. Therefore we chose cesium as the primary blocker for those experiments, but did validate several other Cs+-based results with ZD-7288. These controls will also be represented in a more clear fashion in a new supplementary figure.

    1. Author Response

      We thank all the reviewers for their comments and insight. We plan to address the comments and recommendations in the revised version of the manuscript. Provisional response on key points are given below.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Chowdhury and co-workers provide interesting data to support the role of G4-structures in promoting chromatin looping and long-range DNA interactions. The authors achieve this by artificially inserting a G4-containing sequence in an isolated region of the genome using CRISPR-Cas9 and comparing it to a control sequence that does not contain G4 structures. Based on the data provided, the authors can conclude that G4-insertion promotes long-range interactions (measured by Hi-C) and affects gene expression (measured by qPCR) as well as chromatin remodelling (measured by ChIP of specific histone markers).

      Whilst the data presented is promising and partially supports the authors' conclusion, this reviewer feels that some key controls are missing to fully support the narrative. Specifically, validation of actual G4-formation in chromatin by ChIP-qPCR (at least) is essential to support the association between G4-formation and looping. Moreover, this study is limited to a genomic location and an individual G4-sequence used, so the findings reported cannot yet be considered to reflect a general mechanism/effect of G4-formation in chromatin looping.

      Strengths:

      This is the first attempt to connect genomics datasets of G4s and HiC with gene expression. The use of Cas9 to artificially insert a G4 is also very elegant.

      Weaknesses:

      Lack of controls, especially to validate G4-formation after insertion with Cas9. The work is limited to a single G4-sequence and a single G4-site, which limits the generalisation of the findings.

      In an earlier study, we reported intracellular G4 formation in the hTERT promoter region in human cells (Sharma et al., Cell Reports, 2021). Exactly the same stretch of DNA was taken for insertion here. This is mentioned in the current manuscript as- “The array of G4-forming sequences used for insertion was previously reported to form stable G4s in human cells.” under the paragraph titled “Insertion of an array of G4s in an isolated locus” in the Results section. As the reviewer points out, we understand that intracellular G4 formation needs to be confirmed upon insertion at the non-native location. These experiments/results will be included in the revised version.

      To directly address the second point we are attempting insertion of the same G4-sequence at another locus. Experiments/results on this, and if the insertion is successful, how the insertion affects chromatin organization and nearby gene expression will be included in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Roy et al. investigated the role of non-canonical DNA structures called G-quadruplexes (G4s) in long-range chromatin interactions and gene regulation. Introducing a G4 array into chromatin significantly increased the number of long-range interactions, both within the same chromosome (cis) and between different chromosomes (trans). G4s functioned as enhancer elements, recruiting p300 and boosting gene expression even 5 megabases away. The study proposes a mechanism where G4s directly influence 3D chromatin organization, facilitating communication between regulatory elements and genes.

      Strength:

      The findings are valuable for understanding the role of G4-DNA in 3D genome organization and gene transcription.

      Weaknesses:

      The study would benefit from more robust and comprehensive data, which would add depth and clarity.

      (1) Lack of G4 Structure Confirmation: The absence of direct evidence for G4 formation within cells undermines the study's foundation. Relying solely on in vitro data and successful gene insertion is insufficient.

      As pointed out in response to the above comment, direct evidence of G4 formation by the stretch of DNA was published by us earlier (Sharma et al., Cell Reports, 2021). We understand here it is important to check/confirm this at the insertion site. These experiments are being initiated.

      (2) Alternative Explanations: The study does not sufficiently address alternative explanations for the observed results. The inserted sequences may not form G4s or other factors like G4-RNA hybrids may be involved.

      G4 formation at the insertion site will be checked to confirm. It has been reported G4 structures associate with R-loops to strengthen CTCF binding and enhance chromatin looping (Wulfridge et al., 2023). This can discussed further for readers.

      (3) Limited Data Depth and Clarity: ChIP-qPCR offers limited scope and considerable variation in some data makes conclusions difficult.

      Variation with one of the primers in a few ChIP-qPCR experiments (in Figures 2 and 3D) we have noted. The change however was statistically significant, and consistent with the overall trend across experiments (Figures 2, 3 and 4). Enhancer function, in addition to ChIP, was confirmed using other assays like 3C and RNA expression.

      (4) Statistical Significance and Interpretation: The study could be more careful in evaluating the statistical significance and magnitude of the effects to avoid overinterpreting the results.

      As pointed out, the manuscript will be revised to ensure we are not overinterpreting any results.

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to demonstrate the role of G-quadruplex DNA structures in the establishment of chromosome loops. The authors introduced an array of G4s spanning 275 bp, naturally found within a very well-characterized promoter region of the hTERT promoter, in an ectopic region devoid of G-quadruplex and annotated gene. As a negative control, they used a mutant version of the same sequence in which G4 folding is impaired. Due to the complexity of the region, 3 G4s on the same strand and one on the opposite strand, 12 point mutations were made simultaneously (G to T and C to A). Analysis of the 3D genome organization shows that the WT array establishes more contact within the TAD and throughout the genome than the control array. Additionally, a slight enrichment of H3K4me1 and p300, both enhancer markers, was observed locally near the insertion site. The authors tested whether the expression of genes located either nearby or up to 5 Mb away was up-regulated based on this observation. They found that four genes were up-regulated from 1.5 to 3-fold. An increased interaction between the G4 array compared to the mutant was confirmed by the 3C assay. For in-depth analysis of the long-range changes, they also performed Hi-C experiments and showed a genome-wide increase in interactions of the WT array versus the mutated form.

      Strengths:

      The experiments were well-executed and the results indicate a statistical difference between the G4 array inserted cell line and the mutated modified cell line.

      Weaknesses:

      The control non-G4 sequence contains 12 point mutations, making it difficult to draw clear conclusions. These mutations not only alter the formation of G4, but also affect at least three Sp1 binding sites that have been shown to be essential for the function of the hTERT promoter, from which the sequence is derived. The strong intermingling of G4 and Sp1 binding sites makes it impossible to determine whether all the observations made are dependent on G4 or Sp1 binding. As a control, the authors used Locked Nucleic Acid probes to prevent the formation of G4. As for mutations, these probes also interfere with two Sp1 binding sites. Therefore, using this alternative method has the same drawback as point mutations. This major issue should be discussed in the paper. It is also possible that other unidentified transcription factor binding sites are affected in the presented point mutants.

      Since the sequence we used to test the effects of G4 structure formation is highly G-rich, we had to introduce at least 12 mutations to be sure that a stable G4 structure would not form in the mutated control sequence. Sp1 has been reported to bind to G4 structures (Raiber et al., 2012). So, Sp1 binding could also be associated with the G4-dependent enhancer functions observed here. We also appreciate that apart from Sp1, other unidentified transcription factor binding sites might be affected by the mutations we introduced. We will discuss these possibilities in the revised version of the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors establish a recombinant insect cell expression and purification scheme for the antiviral Dicer complex of C. elegans. In addition to Dicer-1, the complex harbors two additional proteins, the RIG-I-like helicase DRH-1, and the dsRNA-binding protein RDE-4. The authors show that the complex prefers blunt-end dsRNA over dsRNAs that contain overhangs. Furthermore, whereas ATP-dependent dsRNA cleavage only exacerbates regular dsRNA cleavage activity, the presence of RDE-4 is essential to ATP-dependent and ATP-independent dsRNA cleavage. Single-particle cryo-EM studies of the ternary C. elegans Dicer complex reveal that the N-terminal domain of DRH-1 interacts with the helicase domain of DCR-1, thereby relieving its autoinhibitory state. Lastly, the authors show that the ternary complex is able to processively cleave long dsRNA, an activity primarily relying on the helicase activity of DRH-1.

      Strengths:

      First thorough biochemical characterization of the antiviral activity of C. elegans Dicer in complex with the RIG-I-like helicase DRH-1 and the dsRNA-binding protein RDE-4. • Discovery that RDE-4 is essential to dsRNA processing, whereas ATP hydrolysis is not.

      Discovery of an autoinhibitory role of DRH-1's N-terminal domain (in analogy to the CARD domains of RIG-I).

      First structural insights into the ternary complex DCR-1:DRH-1:RDE-4 by cryo-EM to medium resolution.

      Trap experiments reveal that the ternary DCR-1 complex cleaves blunt-ended dsRNA processively. Likely, the helicase domain of DRH-1 is responsible for this processive cleavage.

      We thank the reviewer for this accurate and thoughtful summary of the strengths of our study. We note that although ATP hydrolysis is not essential for dsRNA processing, it is essential for promoting an alternative, and dramatically more efficient, cleavage mechanism that is wellsuited for processing viral dsRNA.

      Weaknesses:

      Cryo-EM Structure of the ternary Dicer-1:DRH-1:RED-4 complex to only medium resolution.

      We agree with the reviewer that our structures are only of modest resolution. We continue to work towards a higher resolution structure of this conformationally heterogeneous complex. We do want to emphasize that despite our modest resolution, our structures provide novel insights into how the factors in the antiviral complex interact with each other, and also allow us to compare our findings to other Dicer systems. For example, the dsRNA binding protein RDE-4 binds the Hel2i subdomain, and this is similar to accessory dsRNA binding proteins of other Dicers, including human and Drosophila. Most importantly, for the first time, we uncover the interaction of DRH-1 with C. elegans Dicer; our structures show DRH-1's N-terminal domain interacting with Dicer's helicase domain. This observation spurred our experiments that showed the N-terminal domain of DRH-1, like the analogous domain of RIG-I, enables an autoinhibited conformation. While RIG-I autoinhibition is relieved by dsRNA binding, we do not observe this with C. elegans DRH-1 and speculate that instead it is the interaction with Dicer's helicase domain that relieves autoinhibition.

      High-resolution structure of the C-terminal domain of DRH-1 bound to dsRNA does not reveal the mechanism of how blunt-end dsRNA and overhang-containing one are being discriminated.

      The cryo-EM structure of DCR1:DRH-1:RDE-4 in the presence of ATP only reveals the helicase and CTD domains of DRH-1 bound to dsRNA. No information on dsRNA termini recognition is presented. The paragraph seems detached from the general flow of the manuscript.

      We agree with the reviewer that our paper would be improved with a high-resolution structure of DRH-1 bound to the dsRNA terminus to better understand terminus discrimination. Since we did not obtain a high-resolution structure of DRH-1 bound to the dsRNA terminus, we could not comment on how DRH-1 discriminates termini. However, our structure of DRH-1’s helicase and CTD bound to the middle of the dsRNA does provide important evidence that DRH-1 translocates along dsRNA, which is crucial for our interpretation of DRH-1’s ATPase function in the antiviral complex. Furthermore, our analysis of the DRH-1:dsRNA contacts reveals just how well conserved DRH-1 is with mammalian RLRs.

      The antiviral DCR-1:DRH-1:RDE-4 complex shows largely homologous activities and regulation than Drosophila Dicer-2.

      It is unclear to us why this is a weakness. In our Discussion in the section “Relationship to previously characterized Dicer activities,” we compare and contrast the C. elegans antiviral complex and the most well characterized antiviral Dicer: Drosophila Dcr2. While it might not be surprising that two invertebrate activities that both must target viral dsRNA have similar enzymatic properties, we find this remarkable given that Dcr2 orchestrates cleavage with a single protein, while two helicases and a dsRNA binding protein cooperate in the C. elegans reaction. Our careful biochemical analyses reveal how the three proteins cooperate. In vivo, C. elegans Dicer must function to cleave pre-miRNAs, endogenous siRNAs as well as viral dsRNA, and we speculate that the use of diverse accessory factors allows C. elegans Dicer to carry out these distinct tasks.

      Reviewer #2 (Public Review):

      Summary:

      To investigate the evolutionary relationship between the RNAi pathway and innate immunity, this study uses biochemistry and structural biology to investigate the trimeric complex of Dicer1, DRH-1 (a RIGI homologue), and RDE-4, which exists in C. elegans. The three subunits were co-expressed to promote stable purification of the complex. This complex promoted ATPdependent cleavage of blunt-ended dsRNAs. A detailed kinetic analysis was also carried out to determine the role of each subunit of the trimeric complex in both the specificity and efficiency of cleavage. These studies indicate that RDE-4 is critical for cleavage while DRC-1 is primarily involved in the specificity of the reaction, and DRH-1 promotes ATP hydrolysis. Finally, a moderate density (6-7 angstrom) cryo-EM structure is presented with attempts to position each of the components.

      Strengths:

      (1) Newly described methods for studying the C. elegans DICER complex.

      (2) New structure, albeit only moderate resolution.

      (3) Kinetic study of the complex in the presence and absence of individual subunits and mutations, provides detailed insight into the contribution of each subunit.

      Weaknesses:

      (1) Limited insight due to limited structural resolution.

      (2) No attempts to extend findings to other Dicer or RLR systems.

      Overall, we agree with the assessment of this reviewer, and we thank them for their efforts in evaluating our manuscript. Whenever possible we have discussed the similarities and differences of the C. elegans Dicer to other Dicers and RLR systems. We are unsure how we could have expanded upon this further (as suggested in point 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor recommendations to the authors:

      Page 10: To assess the role of ATP hydrolysis for dsRNA binding, please refrain from using the term "fuzzy band" as a qualitative measure of RNA binding to the ternary complexes.

      We searched our entire manuscript and did not find the term “fuzzy band.” We did describe some of the bands in the gel shift assays as “diffuse.” This is an accurate description of the bands we see in our gels and distinguishes them from other more well-defined bands.

      Page 13: "positioned internally" - please explain "internally" better here.

      We agree with the reviewer that “positioned internally” is confusing. In our revised manuscript we have changed this sentence to (Page 13, line 1):

      “Under these conditions, we obtained a 2.9 Å reconstruction of the helicase and CTD domains of DRH-1 bound to the middle region of the dsRNA, rather than its terminus (Figures 4C and S9), suggesting that DRH-1 hydrolyzes ATP to translocate along dsRNA.”

      Page 13: Please re-consider the detailed description of the dsRNA:DRH-1 contacts.

      We feel it is very important to illustrate and describe these contacts, which will be of interest to those who study mammalian RLRs.

      Figure 1C/D: Please write "minus/+ ATP" on top of the gels to make this distinction more clearly visible.

      In our original manuscript the gels are labeled with “minus ATP” (panel C) or “5mM ATP” (panel D) on the left to indicate both gels in panel C and both gels in panel D have the same conditions. This is also stated in the figure legend. We have not made revisions in response to this comment because we think it is already clear.

      Figure 2: Please explain R = RDE-4 in a clearly visible legend.

      We agree with the author that the illustration above the gels was not explained clearly. In our revised manuscript we have added the sentence below to the beginning of Figure legend 2A. “Cartoons indicate complexes and variants, with mutations in DCR-1 (green) and DRH-1 (blue) indicated with the amino acid change, and the presence of RDE-4 (R) represented with a purple circle.”

      Figure 4A: Please label the DRH-1 helicase domain and the C-terminal domain.

      We agree with the reviewer that we could more clearly define our labeled domains. In the revised manuscript we have added a sentence to the legend of Figure 4A: “The domains of DCR-1, DRH-1, and RDE-4 are color coded the same as in Fig 1A. For simplicity, only domains discussed in the text are labeled.”

      Reviewer #2 (Recommendations For The Authors):

      This study is complete in that all necessary controls and data are included and the authors are careful in their interpretation so as to not overstate the data or conclusions. The only suggestion is that further extension of the study to address the weaknesses above would increase the breadth of impact of this work.

      We thank the reviewer for their thoughtful comments. Weaknesses are addressed above in public reviews, and we will add again that we agree that a higher resolution structure would provide additional insight. In ongoing research, we are working towards this goal.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful comments. We sincerely agree with the comments from both reviewers, and noticed the word “cell transplantation”, throughout the manuscript including the title, was confusing. We revised the manuscript to clarify the aim of the study, and to express the conclusion more straightforwardly.

      Response to the reviewers:

      We interpret the data of the present study as the color of each RPE cell is a temporal condition which does not necessarily represent the quality (e.g. for cell transplantation) of the cells. We consider this may be applicable not only in vitro but also in vivo, although we do not know whether RPE shows heterogeneous level of pigmentation in vivo.

      As our concern for iPSC-RPE is always about their quality for cell transplantation, maybe we haven’t fairly evaluated the scientific significance obtained from the present study.

      Another thing we noticed was, although we used the term “cell transplantation” to explain what we meant by “quality” of the cells, we agree this was confusing. The aim of the study was not to show how the pigmentation level of transplant-RPE affects the result of cell transplantation, but to show the heterogeneous gene expression of iPSC-derived RPE cells, and the less correlation of the heterogeneity with pigmentation level. We went through the manuscript, including the title, to more straightforwardly lead this conclusion: the degree of pigmentation had some but weak correlation with the expression levels of functional genes, and the reason for the weakness of the correlation may be because the color is a temporal condition (as we interpreted from the data) that is different from more stable characteristics of the cells.

      We agree that “cell transplantation” in the title (and other parts) was misleading. So, we changed the title, and removed the phrase that led as if the aim of the study was to show something about cell transplantation or in vivo results.

      Also, to face scientifically significant results obtained from the present study appropriately, we discussed more about the correlation of the pigmentation level with some functional genes, and brought this as one of the conclusions of the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the two reviewers for their very thoughtful suggestions and the editors for writing the eLife assessment. We will submit a revised manuscript that addresses most comments and include a point-by-point response to the reviewers. We will provide evidence that overexpression of the HtrA1 protease and knockdown of its inhibitor SerpinE2 reduce the development of neural crest-derived cartilage elements in the head of Xenopus embryos. This will be done by whole mount in situ hybridization, using a probe for the chondrogenic marker Sox9. We will also provide two time-lapse movies showing (1) collective migration of cranial neural crest cells in culture and (2) failure of these cells to adhere to fibronectin upon SerpinE2 depletion. We will discuss in more depth how the SerpinE2-HtrA1 proteolytic pathway and its target, the heparan sulfate proteoglycan Syndecan-4, might regulate FGF signaling and suggest a model, in which serpin secreted by the leader cells and the protease released by the follower cells might establish a chemotactic FGF gradient for the directed migration of the neural crest cohort. The criticism that other factors such as proliferation and cell survival might contribute to the observed craniofacial phenotypes upon misexpression of SerpinE2 and HtrA1, and that it remains unclear to what extent the mechanism reported here is conserved in the trunk neural crest is valid. The reason we focused on the more amenable cranial neural crest in the Xenopus embryo and used a multitude of approaches – structure-function studies, biochemical analyses, in vitro explant assays and epistatic experiments in vivo – was to validate a central finding: that an extracellular proteolytic pathway involving a serpin, a protease and a proteoglycan regulates by a double inhibition mechanism collective cell migration.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Methods, please state the sex of the mice.

      This has now been added to the methods section:

      “Three to nine month old Thy1-GCaMP6S mice (Strain GP4.3, Jax Labs), N=16 stroke (average age: 5.4 months; 13 male, 3 female), and 5 sham (average age: 6 months; 3 male, 2 female), were used in this study.”

      (2) The analysis in Fig 3B-D, 4B-C, and 6A, B highlights the loss of limb function, firing rate, or connections at 1 week but this phenomenon is clearly persisting longer in some datasets (Fig. 3 and 6). Was there not a statistical difference at weeks 2,3,4,8 relative to "Pre-stroke" or were comparisons only made to equivalent time points in the sham group? Personally, I think it is useful to compare to "pre-stroke" which should be more reflective of that sample of animals than comparing to a different set of animals in the Sham group. A 1 sample t-test could be used in Fig 4 and 6 normalized data.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for proper depiction of results, and all normalized datasets have been replaced with nonnormalized datasets. All within group statistics are now indicated within the manuscript.

      (3) Fig 4A shows a very striking change in activity that doesn't seem to be borne out with group comparisons. Since many neurons are quiet or show very little activity, did the authors ever consider subgrouping their analysis based on cells that show high activity levels (top 20 or 30% of cells) vs those that are inactive most of the time? Recent research has shown that the effects of stroke can have a disproportionate impact on these highly active cells versus the minimally active ones.

      A qualitative analysis supports a loss of cells with high activity at the 1-week post-stroke timepoint, and examination of average firing rates at 1-week shows reductions in the animals with the highest average rates. However, we have not tracked responses within individual neurons or quantitatively analyzed the data by subdividing cells into groups based on their prestroke activity levels. We have amended the discussion of the manuscript with the following to highlight the previous data as it relates to our study:

      “Recent research also indicates that stroke causes distinct patterns of disruption to the network topology of excitatory and inhibitory cells [73], and that stroke can disproportionately disrupt the function of high activity compared to low activity neurons in specific neuron sub-types [61]. Mouse models with genetically labelled neuronal sub-types (including different classes of inhibitory interneurons) could be used to track the function of those populations over time in awake animals.”

      (4) Fig 4 shows normalized firing rates when moving and at rest but it would be interesting to know what the true difference in activity was in these 2 states. My assumption is that stroke reduces movement therefore one normalizes the data. The authors could consider putting non-normalized data in a Supp figure, or at least provide a rationale for not showing this, such as stating that movement output was significantly suppressed, hence the need for normalization.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for proper depiction of results, and all normalized datasets have been replaced with nonnormalized datasets.

      (5) One thought for the discussion. The fact that the authors did not find any changes in "distant" cortex may be specific to the region they chose to sample (caudal FL cortex). It is possible that examining different "distant" regions could yield a different outcome. For example, one could argue that there may have been no reason for this area to "change" since it was responsive to FL stimuli before stroke. Further, since it was posterior to the stroke, thalamocortical projects should have been minimally disturbed.

      We would like to thank the reviewer for this comment. We have amended the discussion with the following:

      “Our results suggest a limited spatial distance over which the peri-infarct somatosensory cortex displays significant network functional deficits during movement and rest. Our results are consistent with a spatial gradient of plasticity mediating factors that are generally enhanced with closer proximity to the infarct core [84,88,90,91]. However, our analysis outside peri-infarct cortex is limited to a single distal area caudal to the pre-stroke cFL representation. Although somatosensory maps in the present study were defined by a statistical criterion for delineating highly responsive cortical regions from those with weak responses, the distal area in this study may have been a site of activity that did not meet the statistical criterion for inclusion in the baseline map. The lack of detectable changes in population correlations, functional connectivity, assembly architecture and assembly activations in the distal region may reflect minimal pressure for plastic change as networks in regions below the threshold for regional map inclusion prior to stroke may still be functional in the distal cortex. Thus, threshold-based assessment of remapping may further overestimate the neuroplasticity underlying functional reorganization suggested by anaesthetized preparations with strong stimulation. Future studies could examine distal areas medial and anterior to the cFL somatosensory area, such as the motor and pre-motor cortex, to further define the effect of FL targeted stroke on neuroplasticity within other functionally relevant regions. Moreover, the restriction of these network changes to peri-infarct cortex could also reflect the small penumbra associated with photothrombotic stroke, and future studies could make use of stroke models with larger penumbral regions, such as the middle cerebral artery occlusion model. Larger injuries induce more sustained sensorimotor impairment, and the relationship between neuronal firing, connectivity, and neuronal assemblies could be further probed relative to recovery or sustained impairment in these models.”

      Minor comments:

      Line 129, I don't necessarily think the infarct shows "hyper-fluorescence", it just absorbs less white light (or reflects more light) than blood-rich neighbouring regions.

      Sentence in the manuscript has been changed to:

      “Resulting infarcts lesioned this region, and borders could be defined by a region of decreased light absorption 1 week post-stroke (Fig 1D, Top).”

      Line 130-132: the authors refer to Fig 1D to show cellular changes but these cannot be seen from the images presented. Perhaps a supplementary zoomed-in image would be helpful.

      As changes to the morphology of neurons are not one of the primary objectives of this study, and sampled resolution was not sufficiently high to clearly delineate the processes of neurons necessary for morphological assessment, we have amended the text as follows:

      “Within the peri-infarct imaging region, cellular dysmorphia and swelling was visually apparent in some cells during two photon imaging 1-week after stroke, but recovered over the 2 month poststroke imaging timeframe (data not shown). These gross morphological changes were not visually apparent in the more distal imaging region lateral to the cHL.”

      Lines 541-543, was there a rationale for defining movement as >30mm/s? Based on a statistical estimate of noise?

      Text has been altered as follows:

      “Animal movement within the homecage during each Ca2+ imaging session was tracked to determine animal speed and position. Movement periods were manually annotated on a subset of timeseries by co-recording animal movement using both the Mobile Homecage tracker, as well as a webcam (Logitech C270) with infrared filter removed. Movement tracking data was low pass filtered to remove spurious movement artifacts lasting below 6 recording frames (240ms). Based on annotated times of animal movement from the webcam recordings and Homecage tracking, a threshold of 30mm/s from the tracking data was determined as frames of animal movement, whereas speeds below 30mm/s was taken as periods of rest.”

      Lines 191-195: Note that although the finding of reduced neural activity is in disagreement with a multi-unit recording study, it is consistent with other very recent single-cell Ca++ imaging data after stroke (PMID: 34172735 , 34671051).

      Text has been altered as follows:

      “These results indicate decreased neuronal spiking 1-week after stroke in regions immediately adjacent to the infarct, but not in distal regions, that is strongly related to sensorimotor impairment. This finding runs contrary to a previous report of increased spontaneous multi-unit activity as early as 3-7 days after focal photothrombotic stroke in the peri-infarct cortex [1], but is in agreement with recent single-cell calcium imaging data demonstrating reduced sensoryevoked activity in neurons within the peri-infarct cortex after stroke [60,61].”

      Fig 7. I don't understand what the color code represents. Are these neurons belonging to the same assembly (or membership?).

      That is correct, neurons with identical color code belong to the same assembly. The legend of Fig 7 has been modified as follows to make this more explicit:

      “Fig 7. Color coded neural assembly plots depict altered neural assembly architecture after stroke in the peri-infarct region. (A) Representative cellular Ca2+ fluorescence images with neural assemblies color coded and overlaid for each timepoint. Neurons belonging to the same assembly have been pseudocolored with identical color. A loss in the number of neural assemblies after stroke in the peri-infarct region is visually apparent, along with a concurrent increase in the number of neurons for each remaining assembly. (B) Representative sham animal displays no visible change in the number of assemblies or number of neurons per assembly.”

      Reviewer #2 (Recommendations For The Authors):

      Materials and methods

      Identification of forelimb and hindlimb somatosensory cortex representations [...] Cortical response areas are calculated using a threshold of 95% peak activity within the trial. The threshold is presumably used to discriminate between the sensory-evoked response and collateral activation / less "relevant" response (noise). Since the peak intensity is lower after stroke, the "response" area is larger - lower main signal results in less noise exclusion. Predictably, areas that show a higher response before stroke than after are excluded from the response area before stroke and included after. While it is expected that the remapped areas will exhibit a lower response than the original and considering the absence of neuronal activity, assembly architecture, or functional connectivity in the "remapped" regions, a minimal criterion for remapping should be to exhibit higher activation than before stroke. Please use a different criterion to map the cortical response area after stroke.

      We would like to thank the reviewer for this comment. We agree with the reviewer’s assessment of 95% of peak as an arbitrary criterion of mapped areas. To exclude noise from the analysis of mapped regions, a new statistical criterion of 5X the standard deviation of the baseline period was used to determine the threshold to use to define each response map. These maps were used to determine the peak intensity of the forelimb response. We also measured a separate ROI specifically overlapping the distal region, lateral to the hindlimb map, to determine specific changes to widefield Ca2+ responses within this distal region. We have amended the text as follows and have altered Figure 2 with new data generated from our new criterion for cortical mapping.

      “The trials for each limb were averaged in ImageJ software (NIH). 10 imaging frames (1s) after stimulus onset were averaged and divided by the 10 baseline frames 1s before stimulus onset to generate a response map for each limb. Response maps were thresholded at 5 times the standard deviation of the baseline period deltaFoF to determine limb associated response maps. These were merged and overlaid on an image of surface vasculature to delineate the cFL and cHL somatosensory representations and were also used to determine peak Ca2+ response amplitude from the timeseries recordings. For cFL stimulation trials, an additional ROI was placed over the region lateral to the cHL representation (denoted as “distal region” in Fig 2E) to measure the distal region cFL evoked Ca2+ response amplitude pre- and post-stroke. The dimensions and position of the distal ROI was held consistent relative to surface vasculature for each animal from pre- to post-stroke.”

      Animals

      Mice used have an age that goes from 3 to 9 months. This is a big difference given that literature on healthy aging reports changes in neurovascular coupling starting from 8-9 months old mice. Consider adding age as a covariate in the analysis.

      We do not have sufficient numbers of animals within this study to examine the effect of age on the results observed herein. We have amended the discussion with the following to address this point:

      “A potential limitation of our data is the undefined effect of age and sex on cortical dynamics in this cohort of mice (with ages ranging from 3-9 months) after stroke. Aging can impair neurovascular coupling [102–107] and reduce ischemic tolerance [108–111], and greater investigation of cortical activity changes after stroke in aged animals would more effectively model stroke in humans. Future research could replicate this study with mice in middle-age and aged mice (e.g. 9 months and 18+ months of age), and with sufficient quantities of both sexes, to better examine age and sex effects on measures of cortical function.”

      Statistics

      Please describe the "normalization" that was applied to the firing rate. Since a mixedeffects model was used, why wasn't baseline simply added as a covariate? With this type of data, normalization is useful for visualization purposes.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for the visualization of results, and all normalized datasets have been replaced with nonnormalized datasets. All within group comparisons are now indicated throughout the manuscript and in the figures.

      Introduction

      Line 93 awake, freely behaving but head-fixed. That's not freely. Should just say behaving.

      Sentence has been edited as follows:

      “We used awake, behaving but head-fixed mice in a mobile homecage to longitudinally measure cortical activity, then used computational methods to assess functional connectivity and neural assembly architecture at baseline and each week for 2 months following stroke.”

      110 - 112 The last part of this sentence is unjustified because these areas have been incorrectly identified as locations of representational remapping.

      We agree with the reviewer and have amended the manuscript as follows after re-analyzing the dataset on widefield Ca2+ imaging of sensory-evoked responses: “Surprisingly, we also show that significant alterations in neuronal activity (firing rate), functional connectivity, and neural assembly architecture are absent within more distal regions of cortex as little as 750 µm from the stroke border, even in areas identified by regional functional imaging (under anaesthesia) as ‘remapped’ locations of sensory-evoked FL activity 8-weeks post-stroke.”

      Results

      149-152 There is no observed increase in the evoked response area. There is an observed change in the criteria for what is considered a response.

      We agree with the reviewer. Text has been amended as follows:

      “Fig 2A shows representative montages from a stroke animal illustrating the cortical cFL and cHL Ca2+ responses to 1s, 100Hz limb stimulation of the contralateral limbs at the pre-stroke and 8week post-stroke timepoints. The location and magnitude of the cortical responses changes drastically between timepoints, with substantial loss of supra-threshold activity within the prestroke cFL representation located anterior to the cHL map, and an apparent shift of the remapped representation into regions lateral to the cHL representation at 8-weeks post-stroke. A significant decrease in the cFL evoked Ca2+ response amplitude was observed in the stroke group at 8-weeks post-stroke relative to pre-stroke (Fig 2B). This is in agreement with past studies [19–25], and suggests that cFL targeted stroke reduces forelimb evoked activity across the cFL somatosensory cortex in anaesthetized animals even after 2 months of recovery. There was no statistical change in the average size of cFL evoked representation 8-weeks after stroke (Fig 2C), but a significant posterior shift of the supra-threshold cFL map was detected (Fig 2D). Unmasking of previously sub-threshold cFL responsive cortex in areas posterior to the original cFL map at 8-weeks post-stroke could contribute to this apparent remapping. However, the amplitude of the cFL evoked widefield Ca2+ response in this distal region at 8-weeks post-stroke remains reduced relative to pre-stroke activation (Fig 2E). Previous studies suggest strong inhibition of cFL evoked activity during the first weeks after photothrombosis [25]. Without longitudinal measurement in this study to quantify this reduced activation prior to 8-weeks poststroke, we cannot differentiate potential remapping due to unmasking of the cFL representation that enhances the cFL-evoked widefield Ca2+ response from apparent remapping that simply reflects changes in the signal-to-noise ratio used to define the functional representations. There were no group differences between stroke and sham groups in cHL evoked intensity, area, or map position (data not shown).”

      A lot of the nonsignificant results are reported as "statistical trends towards..." While the term "trend" is problematic, it remains common in its use. However, assigning directionality to the trend, as if it is actively approaching a main effect, should be avoided. The results aren't moving towards or away from significance. Consider rewording the way in which these results are reported.

      We have amended the text to remove directionality from our mention of statistical trends.

      R squared and p values for significant results are reported in the "impaired performance on tapered beam..." and "firing rate of neurons in the peri-infarct cortex..." subsections of the results, but not the other sections. Please report the results in a consistent manner.

      R-squared and p-values have been removed from the results section and are now reported in figure captions consistently.

      Discussion

      288 Remapping is defined as "new sensory-evoked spiking". This should be the main criterion for remapping, but it is not operationalized correctly by the threshold method.

      With our new criterion for determining limb maps using a statistical threshold of 5X the standard deviation of baseline fluorescence, we have edited text throughout the manuscript to better emphasize that we may not be measuring new sensory-evoked spiking with the mesoscale mapping that was done. We have edited the discussion as follows:

      “Here, we used longitudinal two photon calcium imaging of awake, head-fixed mice in a mobile homecage to examine how focal photothrombotic stroke to the forelimb sensorimotor cortex alters the activity and connectivity of neurons adjacent and distal to the infarct. Consistent with previous studies using intrinsic optical signal imaging, mesoscale imaging of regional calcium responses (reflecting bulk neuronal spiking in that region) showed that targeted stroke to the cFL somatosensory area disrupts the sensory-evoked forelimb representation in the infarcted region. Consistent with previous studies, this functional representation exhibited a posterior shift 8-weeks after injury, with activation in a region lateral to the cHL representation. Notably, sensory-evoked cFL representations exhibited reduced amplitudes of activity relative to prestroke activation measured in the cFL representation and in the region lateral the cHL representation. Longitudinal two-photon calcium imaging in awake animals was used to probe single neuron and local network changes adjacent the infarct and in a distal region that corresponded to the shifted region of cFL activation. This imaging revealed a decrease in firing rate at 1-week post-stroke in the peri-infarct region that was significantly negatively correlated with the number of errors made with the stroke-affected limbs on the tapered beam task. Periinfarct cortical networks also exhibited a reduction in the number of functional connections per neuron and a sustained disruption in neural assembly structure, including a reduction in the number of assemblies and an increased recruitment of neurons into functional assemblies. Elevated correlation between assemblies within the peri-infarct region peaked 1-week after stroke and was sustained throughout recovery. Surprisingly, distal networks, even in the region associated with the shifted cFL functional map in anaesthetized preparations, were largely undisturbed.”

      “Cortical plasticity after stroke Plasticity within and between cortical regions contributes to partial recovery of function and is proportional to both the extent of damage, as well as the form and quantity of rehabilitative therapy post-stroke [80,81]. A critical period of highest plasticity begins shortly after the onset of stroke, is greatest during the first few weeks, and progressively diminishes over the weeks to months after stroke [19,82–86]. Functional recovery after stroke is thought to depend largely on the adaptive plasticity of surviving neurons that reinforce existing connections and/or replace the function of lost networks [25,52,87–89]. This neuronal plasticity is believed to lead to topographical shifts in somatosensory functional maps to adjacent areas of the cortex. The driver for this process has largely been ascribed to a complex cascade of intra- and extracellular signaling that ultimately leads to plastic re-organization of the microarchitecture and function of surviving peri-infarct tissue [52,80,84,88,90–92]. Likewise, structural and functional remodeling has previously been found to be dependent on the distance from the stroke core, with closer tissue undergoing greater re-organization than more distant tissue (for review, see [52]).”

      “Previous research examining the region at the border between the cFL and cHL somatosensory maps has shown this region to be a primary site for functional remapping after cFL directed photothrombotic stroke, resulting in a region of cFL and cHL map functional overlap [25]. Within this overlapping area, neurons have been shown to lose limb selectivity 1-month post-stroke [25]. This is followed by the acquisition of more selective responses 2-months post-stroke and is associated with reduced regional overlap between cFL and cHL functional maps [25]. Notably, this functional plasticity at the cellular level was assessed using strong vibrotactile stimulation of the limbs in anaesthetized animals. Our findings using longitudinal imaging in awake animals show an initial reduction in firing rate at 1-week post-stroke within the peri-infarct region that was predictive of functional impairment in the tapered beam task. This transient reduction may be associated with reduced or dysfunctional thalamic connectivity [93–95] and reduced transmission of signals from hypo-excitable thalamo-cortical projections [96]. Importantly, the strong negative correlation we observed between firing rate of the neural population within the peri-infarct cortex and the number of errors on the affected side, as well as the rapid recovery of firing rate and tapered beam performance, suggests that neuronal activity within the peri-infarct region contributes to the impairment and recovery. The common timescale of neuronal and functional recovery also coincides with angiogenesis and re-establishment of vascular support for peri-infarct tissue [83,97–100].”

      “Consistent with previous research using mechanical limb stimulation under anaesthesia [25], we show that at the 8-week timepoint after cFL photothrombotic stroke the cFL representation is shifted posterior from its pre-stroke location into the area lateral to the cHL map. Notably, our distal region for awake imaging was directly within this 8-week post-stroke cFL representation. Despite our prediction that this distal area would be a hotspot for plastic changes, there was no detectable alteration to the level of population correlation, functional connectivity, assembly architecture or assembly activations after stroke. Moreover, we found little change in the firing rate in either moving or resting states in this region. Contrary to our results, somatosensoryevoked activity assessed by two photon calcium imaging in anesthetized animals has demonstrated an increase in cFL responsive neurons within a region lateral to the cHL representation 1-2 months after focal cFL stroke [25]. Notably, this previous study measured sensory-evoked single cell activity using strong vibrotactile (1s 100Hz) limb stimulation under aneasthesia [25]. This frequency of limb stimulation has been shown to elicit near maximal neuronal responses within the limb-associated somatosensory cortex under anesthesia [101]. Thus, strong stimulation and anaesthesia may have unmasked non-physiological activity in neurons in the distal region that is not apparent during more naturalistic activation during awake locomotion or rest. Regional mapping defined using strong stimulation in anesthetized animals may therefore overestimate plasticity at the cellular level.”

      “Our results suggest a limited spatial distance over which the peri-infarct somatosensory cortex displays significant network functional deficits during movement and rest. Our results are consistent with a spatial gradient of plasticity mediating factors that are generally enhanced with closer proximity to the infarct core [84,88,90,91]. However, our analysis outside peri-infarct cortex is limited to a single distal area caudal to the pre-stroke cFL representation. Although somatosensory maps in the present study were defined by a statistical criterion for delineating highly responsive cortical regions from those with weak responses, the distal area in this study may have been a site of activity that did not meet the statistical criterion for inclusion in the baseline map. The lack of detectable changes in population correlations, functional connectivity, assembly architecture and assembly activations in the distal region may reflect minimal pressure for plastic change as networks in regions below the threshold for regional map inclusion prior to stroke may still be functional in the distal cortex. Thus, threshold-based assessment of remapping may further overestimate the neuroplasticity underlying functional reorganization suggested by anaesthetized preparations with strong stimulation. Future studies could examine distal areas medial and anterior to the cFL somatosensory area, such as the motor and pre-motor cortex, to further define the effect of FL targeted stroke on neuroplasticity within other functionally relevant regions. Moreover, the restriction of these network changes to peri-infarct cortex could also reflect the small penumbra associated with photothrombotic stroke, and future studies could make use of stroke models with larger penumbral regions, such as the middle cerebral artery occlusion model. Larger injuries induce more sustained sensorimotor impairment, and the relationship between neuronal firing, connectivity, and neuronal assemblies could be further probed relative to recovery or sustained impairment in these models. Recent research also indicates that stroke causes distinct patterns of disruption to the network topology of excitatory and inhibitory cells [73], and that stroke can disproportionately disrupt the function of high activity compared to low activity neurons in specific neuron sub-types [61]. Mouse models with genetically labelled neuronal sub-types (including different classes of inhibitory interneurons) could be used to track the function of those populations over time in awake animals. A potential limitation of our data is the undefined effect of age and sex on cortical dynamics in this cohort of mice (with ages ranging from 3-9 months) after stroke. Aging can impair neurovascular coupling [102–107] and reduce ischemic tolerance [108–111], and greater investigation of cortical activity changes after stroke in aged animals would more effectively model stroke in humans. Future research could replicate this study with mice in middle-age and aged mice (e.g. 9 months and 18+ months of age), and with sufficient quantities of both sexes, to better examine age and sex effects on measures of cortical function.”

      315 - 317 Remodelling is dependent on the distance from the stroke core, with closer tissue undergoing greater reorganization than more distant tissue. There is no evidence that the more distant tissue undergoes any reorganization at all.

      We agree with the reviewer that no remodelling is apparent in our distal area. We have removed reference to our study showing remodeling in the distal area, and have amended the text as follows:

      “Likewise, structural and functional remodeling has previously been found to be dependent on the distance from the stroke core, with closer tissue undergoing greater re-organization than more distant tissue (for review, see [52]).”

      412-414 The authors speculate that a strong stimulation under anaesthesia may unmask connectivity in distal regions. However, the motivation for this paper is that anaesthesia is a confounding factor. It appears to me that, given the results of this study, the authors should argue that the functional connectivity observed under anaesthesia may be spurious.

      The incorrect word was used here. We have corrected the paragraph of the discussion and amended it as follows:

      “Consistent with previous research using mechanical limb stimulation under anaesthesia [25], we show that at the 8-week timepoint after cFL photothrombotic stroke the cFL representation is shifted posterior from its pre-stroke location into the area lateral to the cHL map. Notably, our distal region for awake imaging was directly within this 8-week post-stroke cFL representation. Despite our prediction that this distal area would be a hotspot for plastic changes, there was no detectable alteration to the level of population correlation, functional connectivity, assembly architecture or assembly activations after stroke. Moreover, we found little change in the firing rate in either moving or resting states in this region. Contrary to our results, somatosensoryevoked activity assessed by two photon calcium imaging in anesthetized animals has demonstrated an increase in cFL responsive neurons within a region lateral to the cHL representation 1-2 months after focal cFL stroke [25]. Notably, this previous study measured sensory-evoked single cell activity using strong vibrotactile (1s 100Hz) limb stimulation under aneasthesia [25]. This frequency of limb stimulation has been shown to elicit near maximal neuronal responses within the limb-associated somatosensory cortex under anesthesia [101]. Thus, strong stimulation and anaesthesia may have unmasked non-physiological activity in neurons in the distal region that is not apparent during more naturalistic activation during awake locomotion or rest. Regional mapping defined using strong stimulation in anesthetized animals may therefore overestimate plasticity at the cellular level.”

      Figures

      Figure 1 and 2: Scale bar missing.

      Scale bars added to both figures.

      Figure 2: The representative image shows a drastic reduction of the forelimb response area, contrary to the general description of the findings. It would also be beneficial to see a graph with lines connecting the pre-stroke and 8-week datapoints.

      The data for Figure 2 has been re-analyzed using a new criterion of 5X the standard deviation of the baseline period for determining the threshold for limb mapping. Figure 2 and relevant manuscript and figure legend text has been amended. In agreement with the reviewers observation, there is no increase in forelimb response area, but instead a non-significant decrease in the average forelimb area.

    1. Author Response

      We would like to thank the reviewers for providing constructive feedback on the manuscript. To address the weaknesses identified, we are performing additional experiments and generating additional data, to be added to the updated manuscript.

      (1) The utility of a pipeline depends on the generalization properties.

      While the proposed pipeline seems to work for the data the authors acquired, it is unclear if this pipeline will actually generalize to novel data sets possibly recorded by a different microscope (e.g. different brand), or different imagining conditions (e.g. illumination or different imagining artifacts) or even to different brain regions or animal species, etc.

      The authors provide a 'black-box' approach that might work well for their particular data sets and image acquisition settings but it is left unclear how this pipeline is actually widely applicable to other conditions as such data is not provided.

      In my experience, without well-defined image pre-processing steps and without training on a wide range of image conditions pipelines typically require significant retraining, which in turn requires generating sufficient amounts of training data, partly defying the purpose of the pipeline. It is unclear from the manuscript, how well this pipeline will perform on novel data possibly recorded by a different lab or with a different microscope.

      To address generalizability, we are performing several validation experiments with data from different 1) channels, 2) species (rat), and 3) microscopes, to highlight the robustness of our deep learning (DL) segmentation model to out-of-distribution data with different characteristics and acquisition protocols. We first used our model to segment three images (507x507 x&y, 250-170 um z) from three C57BL/6 mice acquired on the same two-photon fluorescent microscope following the same imaging protocol. The vasculature was labelled with the Texas Red dextran, as in the current experiment. In place of the EYFP signal from pyramidal neurons (2nd channel), gaussian noise was generated with a mean and standard deviation identical to the acquired vascular channel. A second set of two images(507x507 x&y, 300-400 um z) from two Fischer rats with Alexa680-dextran label in the plasma; these rats were imaged on the same two-photon fluorescence microscope, but with galvano scanners (instead of resonant scanners). A second channel of random Gaussian noise was also added here. Finally, an image of vasculature from a ex-vivo cleared mouse brain (1665x1205x780 um) imaged on a light sheet fluorescence microscope (Miltenyi UltraMicroscope Blaze) was also segmented with our model. Lectin-DyLight 649 was used to label the vasculature in this cohort. The Dice Score, Precision, Recall, Hausdorff 95%, and Mean surface distance will be reported for all of these additional image segmentations, upon generation of ground truth images. Finally, examples of the generated segmentation masks are presented in Author response image 1 for visual comparison. Of final note, should the segmentation results on a new data set be unsatisfactory, the methods downstream from segmentation are still applicable and the model can be further fine-tuned on other out-of-distribution data.

      Author response image 1.

      Examples of the deep learning model output on out of distribution data from a different mouse strain, from a different species (Fischer rat), and on a different microscope using a different imaging modality.

      (2) Some of the chosen analysis results seem to not fully match the shown data, or the visualization of the data is hard to interpret in the current form.

      We are updating the visualizations to make them more accessible and we will ensure matching between tables and figures.

      (3) Additionally, some measures seem not fully adapted to the current situation (e.g. the efficiency measure does not consider possible sources or sinks). Thus, some additional analysis work might be required to account for this.

      Thank you for your comment. The efficiency metric was selected as it does not consider sources or sinks. We do agree that accounting for vessel subtypes in the analysis (thus classifying larger vessels as either supplying or draining) would be uniquely useful: notwithstanding, it is extremely laborious. We are therefore leveraging machine learning in a parallel project to afford vessel classification by subtype. The source/sink analysis is also confounded by the small field-of-view of in situ 2PFM. Future work will investigate network remodelling across the whole brain with ex-vivo light sheet fluorescence microscopy.

      (4) The authors apply their method to in vivo data. However, there are some weaknesses in the design that make it hard to accept many of the conclusions and even to see that the method could yield much useful data with this type of application. Primarily, the acquisition of a large volume of tissue is very slow. In order to obtain a network of vascular activity, large volumes are imaged with high resolution. However, the volumes are scanned once every 42 seconds following stimulation. Most vascular responses to neuronal activation have come and gone in 42 seconds so each vessel segment is only being sampled at a single time point in the vascular response. So all of the data on diameter changes are impossible to compare since some vessels are sampled during the initial phase of the vascular response, some during the decay, and many probably after it has already returned to baseline. The authors attempt to overcome this by alternating the direction of the scan (from surface to deep and vice versa). But this only provides two sample points along the vascular response curve and so the problem still remains.

      We thank the Reviewer for bringing up this important point.

      Although vessels can show relatively rapid responses to perturbation, vascular responses to photostimulation of ChannelRhodopsin-2 in neighbouring neurons are typically long lasting: they do not come and go in 42 seconds. To demonstrate this point, we acquired higher temporal-resolution images of smaller volumes of tissue over 5 minutes preceding and following the 5-s photoactivation with the original parameters. Imaging protocol was different in that we utilized a piezoelectric motor, smaller field of view, and only 3x frame averaging, resulting in a temporal resolution of 1.57-2.63 seconds. This acquisition was repeated at 4 different cortical depths (325 um, 250 um, 150um, and 40 um) in a single mouse.The vascular radii were estimated using our presented pipeline. Raw data and LOESS fits are shown in Author response image 2 (below). Vessels shorter than 20 um in length were excluded from the analysis. A video of one of the acquisitions is shown along with the timecourses of select vessels’ caliber changes in Author response image 3. The vascular caliber changes following photostimulation persisted for several minutes, consistent with earlier observations by us and others1–4. These higher temporal-resolution scans of smaller tissue volumes will be repeated in two more mice; we will therein assess the repeatability of individual vessel responses to repeated stimulations.

      Author response image 2.

      A. The vascular radii of multiple vessels were imaged at 4 different cortical depths, each within a 507 x (75-150) x (30-45)um tissue volume. Baseline scanning lasted for 5 minutes, followed by 5 seconds of blue or green light stimulation at 4.3 mW/mm2, and culminating in 5 minutes of post-stimulation scanning. B. LOESS fits of the vessel radius estimates for each vessel segment identified.

      Author response image 3.

      Estimated vascular radius at each timepoint for select vessels from the imaging stack shown in the following video: https://flip.com/s/kB1eTwYzwMJE

      (5) A second problem is the use of optogenetic stimulation to activate the tissue. First, it has been shown that blue light itself can increase blood flow (Rungta et al 2017). The authors note the concern about temperature increases but that is not the same issue. The discussion mentions that non-transgenic mice were used to control for this with "data not shown". This is very important data given these earlier reports that have found such effects and so should be included.

      We will update the manuscript to incorporate the data on volumetric scanning in nontransgenic C57BL/6 mice undergoing blue light stimulation, with identical parameters as those used in Thy-ChR2 mice. As before, responders were identified as vessels that following blue light stimulation show a radius change greater than 2 standard deviations of their baseline radius standard deviation: their estimated radii changes are shown in Author response image 4 below. There were no statistical difference between radii distributions of any of the photostimulation conditions and pre-photostimulation baseline. A comparison of this with the transgenic THY1-ChR2-EYFP mice will be included in manuscript updates.

      Author response image 4.

      Radius change measurements for responding vessels from the Thy1-ChR2 mice described in the manuscript (top row) vs. 4 wild-type C57BL6/J mice (bottom row). Response to photostimulation was defined as a change above twice their baseline standard deviation. 458nm light was applied at 1.1 mW/mm^2 and 4.3 mW/mm^2; while 552 nm light was applied at 4.3 mW/mm^2. No statistically significant differences were observed between the radii distributions in any condition, Wilcoxon test, Bonferroni correction.

      (6) Secondly, there doesn't seem to be any monitoring of neural activity following the photo-stimulation. The authors repeatedly mention "activated" neurons and claim that vessel properties change based on distance from "activated" neurons. But I can't find anything to suggest that they know which neurons were active versus just labeled. Third, the stimulation laser is focused at a single depth plane. Since it is single-photon excitation, there is likely a large volume of activated neurons. But there is no way of knowing the spatial arrangement of neural activity and so again, including this as a factor in the analysis of vascular responses seems unjustified.

      Given the high fidelity of Channel-Rhodpsin2 activation with blue light, we assume that all labeled neurons within the volume of photostimulation are being activated. Depending on their respective connectivities, their postsynaptic neurons (whether or not they are labelled) are also activated. We indeed agree with the reviewer that the spatial distribution of neuronal activation is not well defined. We will revise the manuscript to update the terminology from activated to labeled neurons and stress in the Discussion that the motivation for assessing the distance to the closest labelled neuron as one of our metrics is purely to demonstrate the possibility of linking vascular response to activations in some of their neighbouring neurons and including morphological metrics in the computational pipeline. Of final note, the depth-dependence of the distance between labelled neurons and responding vessels can also readily be assessed using our computational pipeline.

      (7) The study could also benefit from more clear illustration of the quality of the model's output. It is hard to tell from static images of 3-D volumes how accurate the vessel segmentation is. Perhaps some videos going through the volume with the masks overlaid would provide some clarity. Also, a comparison to commercial vessel segmentation programs would be useful in addition to benchmarking to the ground truth manual data.

      We generated a video demonstrating the deep-learning model outputs and have made the video available here: https://flip.com/s/_XBs4yVxisNs Additional videos will be uploaded.

      (8) Another useful metric for the model's success would be the reproducibility of the vessel responses. Seeing such a large number of vessels showing constrictions raises some flags and so showing that the model pulled out the same response from the same vessels across multiple repetitions would make such data easier to accept.

      We have generated a figure demonstrating the repeatability of the vascular responses following photoactivation in a volume, and presented them next to the corresponding raw acquisitions for visual inspection. It is important to note that there is a significant biological variability in vessels’ responses to repeated stimulation, as described previously 2,5. Constrictions have been reported in the literature by our group and others 1,3,4,6,7, though their prevalence has not been systematically studied to date. Concerning the reproducibility of our analysis, we will demonstrate model reproducibility (as a metric of its success) in the updated manuscript.

      Author response image 5.

      Registered acquisitions of the vasculature before and after optogenetic stimulation for 5 scan pairs over 3 different stimulation conditions. The estimated radii along vessel segments are presented.

      Author response image 6.

      Sample capillaries constrictions from maximum intensity projections at repeated timepoints following optogenetic stimulation. Baseline (pre-stimulation) image is shown on the left and the post-stimulation image, on the right, with the estimated radius changes listed to the left.

      (9) A number of findings are questionable, at least in part due to these design properties. There are unrealistically large dilations and constrictions indicated. These are likely due to artifacts of the automated platform. Inspection of these results by eye would help understand what is going on.

      Some of the dilations were indeed large in magnitude. We present select examples of large dilations and constrictions ranging in magnitude from 2.08 to 10.80 um for visual inspection (for reference, average, across vessel and stimuli, magnitude of radius changes were 0.32 +/- 0.54 um). Diameter changes above 5 um were visually inspected.

      Author response image 7.

      Additional views of diameter changes in maximum intensity projections ranging in magnitude from 2.08 um to 10.80 um.

      (10) In Figure 6, there doesn't seem to be much correlation between vessels with large baseline level changes and vessels with large stimulus-evoked changes. It would be expected that large arteries would have a lot of variability in both conditions and veins much less. There is also not much within-vessel consistency. For instance, the third row shows what looks like a surface vessel constricting to stimulation but a branch coming off of it dilating - this seems biologically unrealistic.

      We now plot photostimulation-elicited vesselwise radius changes vs. their corresponding baseline radius standard deviations (Author response image 8 below). The Pearson correlation between the baseline standard deviation and the radius change was 0.08 (p<1e-5) for 552nm 4.3 mW/mm^2 stimulation, -0.08 (p<1e-5) for 458nm 1.1 mW/mm^2 stimulation, and -0.04 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. For non-control (i.e. blue) photostimulation conditions, the change in the radius is thus negatively correlated to the vessel’s baseline radius standard deviation. The within-vessel consistency is explicitly evaluated in Figure 8 of the manuscript. As for the instance of a surface vessel constricting while a downstream vessel dilates, it is important to remember that the 2PFM FOV restricts us to imaging a very small portion of the cortical microvascular network (one (among many) daughter vessels showing changes in the opposite direction to the parent vessel is not violating the conservation of mass).

      Author response image 8.

      A plot of the vessel radius change elicited by photostimulation vs. baseline radius standard deviation.

      (11) As mentioned, the large proportion of constricting capillaries is not something found in the literature. Do these happen at a certain time point following the stimulation? Did the same vessel segments show dilation at times and constriction at other times? In fact, the overall proportion of dilators and constrictors is not given. Are they spatially clustered? The assortativity result implies that there is some clustering, and the theory of blood stealing by active tissue from inactive tissue is cited. However, this theory would imply a region where virtually all vessels are dilating and another region away from the active tissue with constrictions. Was anything that dramatic seen?

      The kinetics of the vascular responses are not accessible via the current imaging protocol and acquired data; however, this computational pipeline can readily be adapted to test hypotheses surrounding the temporal evolution of the vascular responses, as shown in Author response image 2 (with higher temporal-resolution data). Some vessels dilate at some time points and constrict at others as shown in Author response image 2. As listed in Table 2, 4.4% of all vessels constrict and 7.5% dilate for 452nm stimulation at 4.3 mW/mm^2. There was no obvious spatial clustering of dilators or constrictors: we expect such spatial patterns to more likely result from different modes of stimulation and/or in the presence of a pathology. The assortativity peaked at 0.4 (i.e. is quite far from 1 where each vessel’s response exactly matches that of its neighbour).

      (12) Why were nearly all vessels > 5um diameter not responding >2SD above baseline? Did they have highly variable baselines or small responses? Usually, bigger vessels respond strongly to local neural activity.

      In Author response image 9, we now present the stimulation-induced radius changes vs. baseline radius variability across vessels with a radius greater than 5 um. The Pearson correlation between the radius change and the baseline radius standard deviation was 0.04 (p=0.5) for 552nm 4.3 mW/mm^2 stimulation, -0.26 (p<1e-5) for 458nm 1.1 mW/mm^2 stimulation, and -0.24 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. We will incorporate an additional analysis to address this issue by identifying responding vessels as those showing supra-threshold percent change in their radius (instead of SD).

      Author response image 9.

      A plot of the vessel radius change elicited by photostimulation vs. baseline radius standard deviation in vessels with a baseline radius greater than 5 um.

      References

      (1) Alarcon-Martinez L, Villafranca-Baughman D, Quintero H, et al. Interpericyte tunnelling nanotubes regulate neurovascular coupling. Nature. 2020;kir 2.1(7823):91-95. doi:10.1038/s41586-020-2589-x

      (2) Mester JR, Bazzigaluppi P, Weisspapir I, et al. In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2. NeuroImage. 2019;192:135-144. doi:10.1016/j.neuroimage.2019.01.036

      (3) O’Herron PJ, Hartmann DA, Xie K, Kara P, Shih AY. 3D optogenetic control of arteriole diameter in vivo. Nelson MT, Calabrese RL, Nelson MT, Devor A, Rungta R, eds. eLife. 2022;11:e72802. doi:10.7554/eLife.72802

      (4) Hartmann DA, Berthiaume AA, Grant RI, et al. Brain capillary pericytes exert a substantial but slow influence on blood flow. Nat Neurosci. Published online February 18, 2021:1-13. doi:10.1038/s41593-020-00793-2

      (5) Mester JR, Bazzigaluppi P, Dorr A, et al. Attenuation of tonic inhibition prevents chronic neurovascular impairments in a Thy1-ChR2 mouse model of repeated, mild traumatic brain injury. Theranostics. 2021;11(16):7685-7699. doi:10.7150/thno.60190

      (6) Mester JR, Rozak MW, Dorr A, Goubran M, Sled JG, Stefanovic B. Network response of brain microvasculature to neuronal stimulation. NeuroImage. 2024;287:120512. doi:10.1016/j.neuroimage.2024.120512

      (7) Hall CN, Reynell C, Gesslein B, et al. Capillary pericytes regulate cerebral blood flow in health and disease. Nature. 2014;508(7494):55-60. doi:10.1038/nature13165

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to extend our gratitude to the reviewers for their meticulous analysis and constructive feedback on our manuscript. We have revised our paper based on the suggestions regarding supporting literature and the theory behind CAPs along with detailed insights regarding our methods. Their suggestions have been extremely useful in strengthening the clarity and rigor of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) There are no obvious problems with this paper and it is relatively straightforward. There are some challenges that I would like to suggest. These variants have multiple mutations, so it would be interesting if you could drill down to find out which mutation is the most important for the collective changes reported here. I would like to see a sequence alignment of these variants, perhaps in the supplemental material, just to get some indication of the extent of mutations involved.

      Finding the most important mutation within a set is a tricky question, as each mutation changes the way future mutations will affect function due to epistasis. Indeed, this is what we aim to explore in this work. To illustrate this point, we included a new supplementary figure S5A. Three critical mutations that emerged quickly, and were frequently observed in other dominant variants, were S477N, T478K, and N501Y. Thus, we computed the EpiScore values of these three mutations, with several critical residues contributing to hACE2 binding. The EpiScore distribution indicates that residues 477, 478, and 501 have strong epistatic (i.e., non-additive) interactions, as indicated by EpiScore values above 2.0.

      To further investigate these epistatic interactions, we first conducted MD simulations and computed the DFI profile of these three single mutants. We analyzed how different the DFI scores of the hACE2 binding interface residues of the RBD are, across three single mutants with Omicron, Delta, and Omicron XBB variants (Fig S5B). Fig S5B shows how mutations at these particular sites affect the binding interface DFI in various backgrounds, as the three mutations are also observed in the Omicron, XBB, and XBB 1.5 variants. If the difference in the DFI profile of the mutant and the given variant is close to 0, then we could safely state that this mutation affected the variant the most. However, what we observe is quite the opposite: the DFI profile of the mutation is significantly different in different variant backgrounds. While these mutations may change overall behavior, their individual contributions to overall function are more difficult to pin down because overall function is dependent on the non-additive interactions between many different residues.

      Author response image 1.

      (A) Three critical mutations that emerged quickly, and were frequently observed in other dominant variants, were S477N, T478K, and N501Y. EpiScores of sites 477, 478, and 501 with one another are shown with k = the binding interface of the open chain. These residues are highly epistatic, producing higher responses than expected when perturbed together. (B) The difference in the dynamic flexibility profiles between the single mutants and the most common variants for the hACE2 binding residues of the RBD. DFI profiles exhibit significant variation from zero, and also show different flexibility in each background variant, highlighting the critical non-additive interactions of the other mutation in the given background variant. Thus, these three critical mutations, impacting binding affinity, do not solely contribute to the binding. There are epistatic interactions with the other mutations in VOCs that shape the dynamics of the binding interface to modulate binding affinity with hACE2.

      As we discussed above, while the epistatic interactions are crucial and the collective impact of the mutations shape the mutational landscape of the spike protein, we would like note that mutation S486P is one of the critical mutations we identify, modulating both antibody and hACE2 binding and our analysis reveals the strong non-additive interactions with the other mutational sites. This mutational site appears in both XBB1.5 and earlier Omicron strains which highlights its importance in functional evolution of the spike protein. CAPs 346R, 486F, and 498Q also may be important, as they have a high EpiScore, indicating critical epistatic interaction with many mutation sites.

      Regarding to the suggestion about presenting the alignment of the different variants, we have attached a mutation table, highlighting the mutated residues for each strain compared to the reference sequence as supplemental Figure S1 along with the full alignment file.

      (2) Also, I am wondering if it would be possible to insert some of these flexibilities and their correlations directly into the elastic network models to enable a simpler interpretation of these results. I realize this is beyond the scope of the present work, but such an effort might help in understanding these relatively complex effects.

      This is great suggestion. A similar analysis has been performed for different proteins by Mcleash (See doi: 10.1016/j.bpj.2015.08.009) by modulating the spring constants of specific position to alter specific flexibility and evaluate change in elastic free energy to identify critical mutation (in particular, allosteric mutation) sites. We will be happy to pursue this as future work.

      Minor

      (3) 1 typo on line 443 - should be binding instead of biding.

      Fixed, thanks for spotting that.

      (4) The two shades of blue in Fig. 4B were not distinguishable in my version.

      To fix this, we have changed the overlapping residues between Delta and Omicron to a higher contrast shade of blue.

      (5) Compensatory is often used in an entirely different way - additional mutations that help to recover native function in the presence of a deleterious mutation.

      Although our previous study (Ose et al. 2022, Biophysical Journal) shows that compensatory mutations were generally additive, the two ideas are not one and the same. We thank the reviewer for pointing this out. Therefore, to clarify, we have now described our results in terms of dynamic additivity, rather than compensation.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors note that the identified CAPs overlap with those of others (Cagliani et al. 2020; Singh and Yi 2021; Starr, Zepeda, et al. 2022). In itself, this merits a deeper discussion and explicit indication of which positions are not identified. However, there is one point that I believe may represent a fundamental flaw in this study in that the calculation of EP from the alignment of S proteins ignores entirely the differences in the interacting interface with which S for different coronaviruses in the alignment interact in the different receptors in each host species. This may be the reason why so many "CAPs" are in the RBD. The authors should at the very least make a convincing case of why they are not simply detecting constraints imposed by the different interacting partners, at least in the case of positions within the RBD interface with ACE2. Another point that the authors should discuss is that ACE2 is not the only receptor that facilitates infection, TMPRSS2 and possibly others have been identified as well. The results should be discussed in light of this.

      To begin with, we have now explicitly noted (on line 135) that “sites 478, 486, 498, and 681 have already been implicated in SARS-CoV-2 evolution, leaving the remaining 11 CAPs as undiscovered candidate sites for adaptation.” Evolutionary analyses are done using orthologous protein sequences, so there is no way to integrate information on different receptors in each host species in the calculation of EPs. However, we appreciate that the preponderance of CAPs in the RBD is likely due to different binding environments. We have added the following text (on line 83) to clarify our point: “Adaptation in this case means a virus which can successfully infect human hosts. As CAPs are unexpected polymorphisms under neutral theory, their existence implies a non-neutral effect. This can come in the form of functional changes (Liu et al. 2016) or compensation for functional changes (Ose et al. 2022). Therefore, we suspect that these CAPs, being unexpected changes from coronaviruses across other host species with different binding substrates, may be partially responsible for the functional change of allowing human infection.” This hypothesis is supported by the overlap of CAPs we identified with the positions identified in other studies (e.g., 478, 486, 498, and 681). Binding to TMPRSS2 and other substrates are also covered by this analysis as it is a measure of overall evolutionary fitness, rather than binding to any specific substrate. Our paper does focus on discussing hACE2 binding and mentions furin cleavage, but indeed lacks discussion on the role of TMPRSS2. We have added the following text to line 157: “Another host cell protease, TMPRSS2, facilitates viral attachment to the surface of target cells upon binding either to sites Arg815/Ser816, or Arg685/Ser686 which overlaps with the furin cleavage site 676-689, further emphasizing the importance of this area (Hoffmann et al. 2020b; Fraser et al. 2022).”

      (2) Turning now to the computational methods utilized to study dynamics, I have serious reservations about the novelty of the results as well as the validity of the methodology. First of all, the authors mention the work of Teruel et al. (PLOS Comp Bio 2021) in an extremely superficial fashion and do not mention at all a second manuscript by Teruel et al. (Biorxiv 2021.12.14.472622 (2021)). However, the work by Teruel et al. identifies positions and specific mutations that affect the dynamics of S and the evolution of the SARS-CoV-2 virus in light of immune escape, ACE2 binding, and open and closed state dynamics. The specific differences in approach should be noted but the results specifically should be compared. This omission is evident throughout the manuscript. Several other groups have also published on the use of nomal-mode analysis methods to understand the Spike protein, among them Verkhivker et al., Zhou et al., Majumder et al., etc.

      Thank you for your suggestions. Upon further examination of the listed papers, we have added citations to other groups employing similar methods. However, it's worth noting that the results of Teruel et al.'s studies are generally not directly comparable to our own. Particularly, they examine specific individual mutations and overall dynamical signatures associated with them, whereas our results are always considered in the context of epistasis and joint effects with CAPs, and all mutations belong to the common variants. Although important mutations may be highlighted in both cases, it is for very different reasons. Nevertheless, we provide a more detailed mention of the results of both studies. See lines 178, 255, and 393.

      (3) The last concern that I have is with respect to the methodology. The dynamic couplings and the derived index (DCI) are entirely based on the use of the elastic network model presented which is strictly sequence-agnostic. Only C-alpha positions are taken into consideration and no information about the side-chain is considered in any manner. Of course, the specific sequence of a protein will affect the unique placement of C-alpha atoms (i.e., mutations affect structure), therefore even ANM or ENM can to some extent predict the effect of mutations in as much as these have an effect on the structure, either experimentally determined or correctly and even incorrectly modelled. However, such an approach needs to be discussed in far deeper detail when it comes to positions on the surface of a protein such that the reader can gauge if the observed effects are the result of modelling errors.

      We would like to clarify that most of our results do not involve simulations of different variants, but rather how characteristic mutation sites for those variants contribute to overall dynamics. For the full spike, we operate on only two simulations: open and closed. When we do analyze different variants, starting on line 438, the observed difference does not come from the structure, but from the covariance matrix obtained from molecular dynamics (MD) simulations, which are sensitive to single amino acid changes.

      Reviewer #3 (Recommendations For The Authors):

      (1) On line 99 there is a misspelling, 'withing'.

      It has been fixed. Thanks for spotting that.

      (2) Some graphical suggestions to make the figures easier to read:

      In Figure 1C, a labeled circle around the important sites, the receptor binding domain, and the Furin cleavage site, would help the reader orient themselves. Moreover, it would make clear which CAPs are NOT in the noteworthy sites described in the text.

      Good idea. We have added transparent spheres and labels to show hACE2 binding sites and Furin cleavage sites.

      In Figure 2C the colors are a bit low contrast; moreover, there are multiple text sizes on the same figure which should perhaps be avoided to ensure legibility.

      We have made yellow brighter and standardized font sizes.

      Figure 3 is a bit dry, perhaps indicating in which bins the 'interesting' sites could be informative.

      Thank you for the suggestion, but the overall goal of Figure 3 is to illustrate that the mutational landscape is governed by the equilibrium dynamics in which flexible sites undergo more mutations during the evolution of the CoV2 spike protein. Therefore, adding additional positional information may complicate our message.

      Figure 4, the previous suggestions about readability apply.

      We ensured same sized text and higher contrast colors.

      Figure 5B, the residue labels are too small.

      We increased the font size of the residue labels.

      In Figure 8 maybe adding Delta to let the reader orient themselves would be helpful to the discussion.

      Unfortunately, there is no single work that has experimentally quantified binding affinities towards hACE2 for all the variants. When we conducted the same analysis for the Delta variant in Figure 8, the experimental values were obtained from a different source (doi: 10.1016/j.cell.2022.01.001) and the values were significantly different from the experimental work we used for Omicron (Yue et al. 2023). When we could adjust based on the difference in experimentally measured binding affinity values of the original Wuhan strain in these two separate studies, we observed a similar correlation, as seen below. However, we think this might not be a proper representation. Therefore, we chose to keep the original figure.

      Author response image 2.

      The %DFI calculations for variants Delta, Omicron, XBB, and XBB 1.5. (A) %DFI profile of the variants are plotted in the same panel. The grey shaded areas and dashed lines indicate the ACE2 binding regions, whereas the red dashed lines show the antibody binding residues. (B) The sum of %DFI values of RBD-hACE2 interface residues. The trend of total %DFI with the log of Kd values overlaps with the one seen with the experiments. (C) The RBD antibody binding residues are used to calculate the sum of %DFI. The ranking captured with the total %DFI agrees with the susceptibility fold reduction values from the experiments.

      (3) Replicas of the MD simulations would make the conclusions stronger in my opinion.

      We ran a 1µs long simulation and performed convergence analysis for the MD simulations using the prior work (Sawle L, Ghosh K. 2016.) More importantly, we also evaluated the statistical significance of computed DFI values as explained in detail below (Please see the answer to question 3 of Reviewer #3 (Public Review):)

      Reviewer #3 (Public Review):

      (1) A longer discussion of how the 19 orthologous coronavirus sequences were chosen would be helpful, as the rest of the paper hinges on this initial choice.

      The following explanation has been added on line 114: EP scores of the amino acid variants of the S protein were obtained using a Maximum Likelihood phylogeny (Kumar et al. 2018) built from 19 orthologous coronavirus sequences. Sequences were selected by examining available non-human sequences with a sequence identity of 70% or above to the human SARS CoV-2’s S protein sequence. This cutoff allows for divergence over evolutionary history such that each amino acid position had ample time to experience purifying selection, whilst limiting ourselves to closely related coronaviruses. (Figure 1A).

      (2) The 'reasonable similarity' with previously published data is not well defined, nor there was any comment about some of the residues analyzed (namely 417-484). We have revised this part of the manuscript and add to the revised version.

      We removed the line about reasonable similarity as it was vague, added a line about residues 417-484, and revised the text accordingly, starting on line 354.

      (3) There seem to be no replicas of the MD simulations, nor a discussion of the convergence of these simulations. A more detailed description of the equilibration and production schemes used in MD would be helpful. Moreover, there is no discussion of how the equilibration procedure is evaluated, in particular for non-experts this would be helpful in judging the reliability of the procedure.

      We opted for a single, extended equilibrium simulation to comprehensively explore the longterm behavior of the system. Given the specific nature of our investigation and resource constraints, a well-converged, prolonged simulation was deemed a practical and scientifically valid approach, providing a thorough understanding of the system's dynamics. (doi: 10.33011/livecoms.1.1.5957, https://doi.org/10.1146/annurev-biophys-042910-155255 )

      We updated our methods section starting on line 605 with extended information about the MD simulations and the converge criteria for the equilibrium simulations. We also added a section that explains our analysis to check statistical significance of obtained DFI values.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly thank you and the reviewers for your expert comments and valuable suggestions on our manuscript. After reading these comments, we realized that the previous version of the manuscript contained some weak points. Surely, the issues raised by the six reviewers were of great help in the revision of our manuscript.

      According to the comments, we have now fully revised the manuscript to address most of the questions and suggestions. In addition, we reworded some parts of the Introduction, Results and Discussion, Figures, Figure legends and Experimental Methods to increase the rigor of our conclusions.

      Overall, you will see that we have paid serious attention to all the concerns and criticisms expressed by reviewers. Addressing these various issues has most certainly allowed us to prepare a much-improved manuscript and for this we offer our hearty thanks.

      Reviewer #1 (Public Review):

      Summary:

      The organization of cell surface receptors in membrane nanodomains is important for signaling, but how this is regulated is poorly understood. In this study, the authors employ TIRFM single-molecule tracking combined with multiple analyses to show that ligand exposure increases the diffusion of the immune receptor FLS2 in the plasma membrane and its co-localization with remorin REM1.3 in a manner dependent on the phosphosite S938. They additionally show that ligand increases the dwell time of FLS2, and this is linked to FLS2 endocytosis, also in a manner dependent on S938 phosphorylation. The study uncovers a regulatory mechanism of FLS2 localization in the nanodomain crucial for signaling.

      Strengths:

      TIRFM single-molecule tracking, FRAP, FRET, and endocytosis experiments were nicely done. The role of S938 phosphorylation is convincing.

      Weaknesses:

      Question 1: The model suggests that S938 is phosphorylated upon flg22 treatment. This is actually not known.

      Reply: Thank you for your expert comments. Although the phosphorylation of Ser-938 upon flg22 treatment is not known, the model presented in the manuscript is based on previous studies that have shown the importance of Ser-938 phosphorylation for the function of FLS2 (Cao et al, 2013). When it is mutated to the phosphorylation-mimicking residues aspartate or glutamate, immune responses remain normal. These findings suggest that the phosphorylation of Ser-938 plays a critical role in activating defense mechanisms upon flagellin detection (Cao et al, 2013). Now we added the results of Cao et al. (2013) to the introduction to strengthen in the revised manuscript.

      Question 2: In addition, the S938D mutant does not show constitutively increased diffusion and co-localization with remorin. It is necessary to soften the tone in the conclusion.

      Reply: We appreciate the valuable suggestions from the reviewer. Based on our findings, we observed that the phosphorylation of Ser-938 significantly impacts the dynamics of flg22-induced FLS2. However, it does not alter the diffusion coefficient of FLS2 itself. In the revised manuscript, we have carefully adjusted the conclusion by softening the tone to reflect these findings.

      Question 3: The introduction (only two paragraphs) and discussion are not properly written in the context of the current understanding of plant receptors in nanodomains. The authors basically just cited a few publications of their own, and this is not acceptable.

      Reply: We accepted the criticisms here. Now, we have reworded the introduction and discussion sections to improve clarity. Furthermore, we have incorporated several new reports on plant receptors in nanodomains into the revised manuscript. Besides, we deleted some publications from our own group, while citing the latest references on plant receptors and nanodomains.

      Reviewer #2 (Public Review):

      Summary:

      The research conducted by Yaning Cui and colleagues delves into understanding FLS2-mediated immunity. This is achieved by comparing the spatiotemporal dynamics of an FLS2-S938A mutant and FLS2-WT, especially in relation to their association with the remorin protein. To delineate the differences between the FLS2-S938A mutant and FLS2-WT, they utilized a plethora of advanced fluorescent imaging techniques. By analyzing surface dynamics and interactions involving the receptor signal co-receptor BAK1 and remorin proteins, the authors propose a model of how FLS2 and BAK1 are assembled and positioned within a remorin-specific nano-environment during FLS2 ligand-induced immune responses.

      Strengths:

      These techniques offer direct visualizations of molecular dynamics and interactions, helping us understand their spatial relationships and interactions during innate immune responses. Advanced cell biology imaging techniques are crucial for obtaining high-resolution insights into the intracellular dynamics of biomolecules. The demonstrated imaging systems are excellent examples to be used in studying plant immunity by integrating other functional assays. Weaknesses:

      It's essential to acknowledge that every fluorescence-based method, just like biochemical assays, comes with its unique limitations. These often pertain to spatial and temporal resolutions, as well as the sensitivity of the cameras employed in each setup. Meticulous interpretation is pivotal to guarantee an accurate depiction and to steer clear of potential misunderstandings when employing specific imaging systems to analyze molecular attributes. Moreover, a discerning interpretation and accurate image analysis can offer invaluable guidance for future studies on plant signaling molecules using these nice cell imaging techniques. For instance, although single-particle analysis couldn't conclusively link FLS2 and remorin, FLIM-FRET effectively highlighted their ligand-triggered association and the disengagement brought on by mutations. While these methodologies seemed to present differing outcomes, they were described in the manuscript as harmonious. In reality, these differences could highlight distinct protein populations active in immune responses, each accentuated differently by the respective imaging techniques due to their individual spatial and temporal limitations. Addressing these variations is imperative, especially when designing future imaging explorations of immune complexes.

      Reply: Thank you for your insightful comments and suggestions. We appreciate your expertise in fluorescence-based methods and the importance of careful interpretation and accurate image analysis. We agree with you that different imaging techniques may have their limitations and can highlight distinct aspects of protein dynamics and interactions.

      In our study, we used single-particle analysis and FLIM-FRET to investigate the spatiotemporal dynamics of FLS2 and its association with remorin. While single-particle analysis did not conclusively link FLS2 and remorin, FLIM-FRET effectively highlighted their ligand-triggered association and the disengagement caused by mutations. We acknowledge that these techniques may have different spatial and temporal resolutions, leading to the discrepancy in their results. However, after the normalized treatment, we can provide very similar conclusions. Accordingly, we have revised the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Receptor kinases (RKs) perceive extracellular signals to regulate many processes in plants. FLS2 is an RK that acts as a pattern-recognition receptor (PRR) to recognize bacterial flagellin and activate pattern-triggered immunity (PTI). PRRs such as FLS2 have been previously shown to reside within PM nanodomains, which can regulate downstream PTI signaling. In the current manuscript, Cui et al use single particle tracking to characterize the effect of previously-described phosposite mutants (FLS2-S938A/D) on the PM organization, endocytosis, and signaling functions of FLS2. The authors confirm that FLS2-S938D but not -S938A is functional for flg22-induced responses, while also demonstrating that phopshodead mutation at this site (S938A) prevents flg22-induced sorting into nanodomains and endocytosis. These results are consistent with S938 being an important phosphorylation site for FLS2 function, however, they fall short of demonstrating that membrane disorganization of FLS2-938A is responsible for downstream signaling defects.

      Strengths:

      The authors' experiments (single particle tracking, co-localization, etc) do a good job of demonstrating how a non-functional version of FLS2 (S938A) does not alter its spatio-temporal dynamics, nanodomain organization, and endocytosis in response to flg22, suggesting that these require a functional receptor and are regulated by intracellular signaling components.

      Weaknesses:

      Question 1: The authors do not provide direct evidence that S938 phosphorylation specifically affects membrane organization, rather than FLS2 signaling more generally. All evidence is consistent with S938A being a non-functional version of FLS2, wherein an activated/functional receptor is required for all downstream events including membrane re-organization, downstream signalling, internalization, etc. Furthermore, the authors never demonstrate that this site is phosphorylated in planta in the basal or flg22-elicited state.

      Reply: Sorry that we did not describe clearly in the original manuscript. In fact, we found in our study that the phosphorylation of the Ser-938 site influences the efficient sorting of FLS2 into AtRem1.3-associated microdomains rather than membrane organization, as depicted in Figure 2. Furthermore, we found that the immune responses are disrupted when Ser-938 is mutated to alanine, which is consistent with previously reported results (Cao et al, 2013). However, they remain normal when mutated to the phosphorylation-mimicking residues aspartate or glutamate. These results suggest that the phosphorylation of Ser-938 is crucial for activating defense mechanisms upon flagellin detection. Although the phosphorylation of Ser-938 in plant at the basal or flg22-elicited state is not known, the model presented in the manuscript is based on the results of our current investigation together with those in the previous study that have shown the importance of Ser-938 phosphorylation for FLS2 function (Cao et al, 2013).

      Question 2: As written, the manuscript also has numerous scientific issues, including a misleading/incomplete description of plant immune signaling, lack of context from previous work, and extensive use of inappropriate references.

      Reply: We accept the criticism here. After reading the comments, we realized the problem. Now we have revised the misleading or incomplete description of plant immune signaling, added the context of previous works and deleted inappropriate references in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Question 1: The description of the data has no details. How many biological repeats were done? How were statistical analyses done? What is the concentration of flg22? How was the calcium flux done (Fig. 4A)? The method also lacks details and relevant references.

      Reply: We apologize for the lack of detail in presenting the data. Following your suggestion, we added comprehensive figure legends that provide clear explanations for each figure. Additionally, we included supplementary information on the measurement methods and references pertaining to calcium flux in the revised manuscript.

      Question 2: Data in Fig. 4 basically repeated the 2013 PLoS Pathog paper. Why were these experiments even performed? Were GFP-tagged FLS2 lines used in these experiments? If this is the case, the data just verified that the GFP-tagged FLS2 functions as expected and should be moved to supporting data.

      Reply: Thanks for the expert suggestions. In our study, we utilized GFP-tagged FLS2 lines to generate FLS2-S938 mutants and conducted experiments to investigate the flg22-induced immune response. Although some experiments in Figure 4 are similar to those reported (Cao et al, 2013), we provided a more detailed analysis of the immune response. The comprehensive analysis included early immune responses and late immune responses, e.g., the activation of a calcium burst, mitogen-activated protein kinases (MAPKs), the induction of immune-responsive genes and callose deposition, ultimately resulting in the inhibition of plant growth. As some results are analogous to the previous paper, we transfer some of the experiments as suggested, including the analysis of MAPKs and callose deposition, to the supporting data section of the revised manuscript.

      Question 3: Flg22-induced FLS2-BAK1 association does not require S938, this is consistent with prior study that flg22 acts as a molecular glue for the ectodomains of FLS2 and BAK1 (Sun et al., 2013 Science). This needs to be cited.

      Reply: Yes, we agree with the comment. Now we added an additional sentence in the revised manuscript: “ This aligns with the previous finding that flg22 acts as a molecular glue for FLS2 and BAK1 ectodomains (Sun et al., 2013).”

      Question 4: Line 50, the references cited do not match what they say here.

      Reply: We are sorry for the mistake in citing inappropriate references. In the revised manuscript, we deleted this sentence as well as the incorrect reference.

      Question 5: Line 105, "flg22 can act as a ligand-like factor". It is a ligand!

      Reply: Sorry for the mistake. Now, the sentence was corrected in the revised manuscript by deleting the word “like”.

      Question 6: Line 107, FLS2/BAK1 heterodimerization, not heteroologomerization.

      Reply: Now we used “heterodimerization” to replace “heteroologomerization” in the revised manuscript.

      Question 7: Line 114, are these really the best references to cite here?

      Reply: After reading the comment, we found the references were not suitable here. Now we changed references by citing “(Martinière et al., 2021)” in the revised manuscript.

      Question 8: Lines 123-124, the sentence is incomplete.

      Reply: In the revised manuscript, we reworded the sentence to make it complete now. We changed “In a previous investigation, we demonstrated that flg22 induces FLS2 translocation from AtFlot1-negative to AtFlot1-positive nanodomains in the plasma membrane, implying a connection between FLS2 phosphorylation and membrane nanodomain distribution (Cui et al., 2018). To validate this, we assessed the association of FLS2/FLS2S938D/FLS2S938A with membrane microdomains, using AtRem1.3-associated microdomains as representatives (Huang et al., 2019).” in the revised manuscript.

      Question 9: Lines 169-170, Why is this "most important"?

      Reply: Sorry for the unsuitable description. As we have dramatically changed the manuscript, this sentence was deleted from the new version.

      Reviewer #2 (Recommendations For The Authors):

      Here are some specific areas of ambiguity in the study to be improved.

      Question 1: Clarity in statistical analysis is necessary. Many figure legends omit details such as the sample size "n", and the nature of the measurements, like ROIs, images, and dots, the size of the seedlings, etc.

      Reply: We appreciated this suggestion, which was raised by the reviewer I as well. Now, we provided the details for each figure, including the sample size, the nature of the measurements in the revised manuscript.

      Question 2: Additional background about the choice of FLS2-S938 mutant would be beneficial, given that this mutant doesn't affect the BAK1 interaction but nullifies several PTI responses.

      Reply: Yes, we agreed that some additional background is required for the FLS2-S938 mutant. Therefore, we added a sentence here: “FLS2 Ser-938 mutations impact flg22-induced signaling, while BAK1 binding remains unaffected, thereby suggesting Ser-938 regulates other aspects of FLS2 activity (Cao et al., 2013).” in the revised manuscript.

      Question 3: A specific segment "... Using CLSM, Fluorescence Correlation Spectroscopy (FCS) and Western blotting, we found that the endocytic vesicles of FLS2S938D increased significantly after flg22 treatment (Figure 3B-3E)..." is not easy to follow. The author may want to differentiate these methods and highlight them by indicting them as endocytic vesicle counting, receptor density on PM measurement by FCS, and WB-based protein degradation characterization to understand such mixed descriptions better. By the way, "Number of Endocytosis" should be "number of endocytic vesicles". Endocytosis is a process and uncountable.

      Reply: We thank the reviewer for kindly reminding us to differentiate experimental methods. Therefore, we changed the sentences in the revised manuscript: “Employing confocal laser-scanning microscopy (CLSM) during 10μM flg22 treatment, we tracked FLS2 endocytosis and quantified vesicle numbers over time (Figure 3B). It is evident that both FLS2 and FLS2S938D vesicles appeared 15 min after-flg22 treatment, significantly increasing thereafter (Figure 3C). Notably, only a few vesicles were detected in FLS2S938A-GFP, indicating Ser-938 phosphorylation's impact on flg22-induced FLS2 endocytosis. Additionally, fluorescence correlation spectroscopy (FCS) (Chen et al., 2009) monitored molecular density changes at the PM before and after flg22 treatment (Figure S3F). Figure 3D shows that both FLS2-GFP and FLS2S938D-GFP densities significantly decreased after flg22 treatment, while FLS2S938A-GFP exhibited minimal changes, indicating Ser-938 phosphorylation affects FLS2 internalization. Western blotting confirmed that Ser-938 phosphorylation influences FLS2 degradation after flg22 treatment (Figure 3E), consistent with single-molecule analysis findings.” Besides, we also changed “number of endocytosis” to “the number of endocytic vesicles” in Figure 3C as suggested.

      Question 4: In Figure 1 E, a discrepancy exists where the total percentages in the red and black columns don't sum up to 100%, while other groups look right. This needs clarification.

      Reply: We are sorry for our carelessness in making the data incomplete. Now we thoroughly supplemented, collated, and rechecked the data in Figure 1E. Due to an oversight during the production of the figure, some data was inadvertently omitted, resulting in the red column not reaching 100%. Besides, we checked the data in the black column again, and the total percentage indeed added up to 100%.

      Question 5: Although Figure 1F uses UMAP analysis to differentiate between FLS2WT and A mutants, only data pertaining to the "D" mutant is shown.

      Reply: Thank you for the expert comments. Because there are several images in Figure 1, we only selected the data related to the “D” mutant as a representative for display. As suggested, we have added all the UMAP images in the revised supplement figure S1F.

      Question 6: There are apparent inconsistencies in the FRAP results, particularly regarding the initial recovery points post-bleaching. A detailed statistical analysis, supplemented with FRAP images over time, should be included for clarity. Were they bleached to a similar ground level before monitoring their recovery? The data points from "before" and "after "bleaching were not shown. I found the red and blue curves showed similar recovery slop, which suggests no long-distance movement changes for all three FLS2 versions, with or without flg22. This is opposite from the conclusions made by the author.

      Reply: Thank you for the expert comments. After reading the comments, we recognized this terrible problem. Therefore, we carried out a new FRAP experiment. The new results showed that, following complete bleaching of three samples of FLS2 to ground level, the recovery rates of FLS2 and FLS2S938D under flg22 treatment were significantly higher compared to the control group (Fig. 1G). In contrast, the recovery rates of the FLS2S938A-GFP after flg22 treatment remain similar to that before treatment (Fig. 1G), indicating that the Ser-938 phosphorylation site indeed affects the flg22-induced lateral diffusion of FLS2 at the PM. The new results are basically consistent with the motion range of single-molecule results, which is not contradictory to long-distance movement changes. Accordingly, we incorporated the new time-lapse FRAP images into Figure 1G and S1B.

      Question 7: There's a potential typo in Figure 1B regarding the bar size. It could neither possibly be 200 um nor 200 nm. Figure 1A also needs a scale bar.

      Reply: Apologies for the mistake. We now corrected “200 μm” to “2 μm”. Besides, we also included a scale bar in Figure 1A in the revised manuscript.

      Question 8: Due to the unreliable tracking for a long-time by Imaris, the authors analyzed the tracks within 10s and quantified very short live particles under 4s. Such 4S surface retention for a receptor does not seem to match functional endocytic internalization time for cargo. Even after the endocytic adaptor module recruitment, it would take at least more than 10s to finish the internalization. In the field of endocytosis, these events are often described as abortive endocytic events. However, the disappearance of cargoes, FLS2 in this case, indicates internalization into the cytoplasm, which is interesting. May the author discuss more on how these short events analyzed enhance our understanding of the functional behavior of FLS2?

      Reply: We greatly appreciated the valuable comments provided by the reviewer. After thorough consideration, we acknowledged that in our original manuscript, we failed to distinguish the short-lived from the long-lived particles and vaguely put them collectively into the internalized particles. We realized that and it is inappropriate to ambiguously categorize all particles as internalized. Therefore, we added the sentence “Additionally, numerous FLS2 exhibited short-lived dwell times, indicating abortive endocytic events associated with the endocytic pathway and signal transduction (Bertot et al., 2018)” in the revised manuscript.

      Question 9: Figure 2D should be comprehensive, presenting data for the WT, A, and D versions.

      Reply: Yes, we agreed with the suggestions. Now, we added several representative images for the WT, A, and D versions in the revised manuscript.

      Question 10: In Figure 2D, TIRM-SIM should be a typo and rectified to TIRF-SIM. Also, a detailed explanation of the TIRF-SIM setup and its specifics would be important. The imaging approach of SIM, especially the time duration for finishing all frames before reconstruction, is essential to rationalize its use in capturing and measuring an appropriate speed range of particle movement. May the author elaborate on the technique details and the use of TIRF-SIM for colocalization analysis? To clarify these, the author may provide additional TIRF-only movies of FLS2 (WT, A, D) and AtRem1.3 for comparison with TIRF-SIM still images.

      Reply: Sorry for the mistake. In the revised manuscript, we have corrected “TIRM-SIM” to “TIRF-SIM”. In order to rationalize its use in capturing and measuring an appropriate speed range of particle movement, we included a more detailed description of the imaging approach and the colocalization analysis of TIRF-SIM in the Materials and Methods section as follows: “The SIM images were taken by a 60 × NA 1.49 objective on a structured illumination microscopy (SIM) platform (DeltaVision OMX SR) with a sCMOS camera (Camera pixel size, 6.5 μm). The light source for TIRF-SIM included diode laser at 488 nm and 568 nm with pixel sizes (μm) of 0.0794 and 0.0794 (Barbieri et al., 2021). For the dual-color imaging, FLS2/FLS2S938A/FLS2S938D-GFP (488 nm/30.0%) and AtRem1.3-mCherry (561 nm/30.0%) were excited sequentially. The exposure time of the camera was set at 50 ms throughout single-particle imaging. The time interval for time-lapse imaging was 100 ms, the total time was 2s, and the total time points were 21s. The Imaris intensity correlation analysis plugin was used to calculate the co-localization ratio.” in the revised manuscript. Furthermore, we provided additional TIRF-SIM movies of FLS2 (WT, A, D) and AtRem1.3.

      Question 11: The colocalization displayed in Figure 2D is hard to tell. A colocalization ratio of FLS2-AtRem1.3 is shown as ~0.8%, which has only ~0.2% difference from the flg22-treated condition. "n" of Figure 2F should be specified in the legend, such as a line with a specific length, or an ROI with a specific area size.

      Reply: Thank you for the expert comments. Although the increased colocalization after flg22 treatment is not high, the change is statistically significant as compared with the wild type. We agreed that every fluorescence-based method, like biochemical analysis, has its own unique limitations, which were raised by the Reviewer #2 (Public Review) as well. In order to provide strong evidence, we also carried out the FLIM-FRET experiment as a supplement, which can effectively detect their ligand-triggered association or disassociation. From figure 2G and H, we clearly found that the co-localization of FLS2/FLS2S938D-GFP with AtRem1.3-mCherry significantly increase in response to flg22 treatment (FLS2-GFP control: 2.45 ± 0.019 s; FLS2-GFP flg22-treated: 2.39 ± 0.016 s; FLS2S938D-GFP control: 2.42 ± 0.010 ns; FLS2S938D-GFP flg22-treated: 2.35 ± 0.028 ns). In contrast, FLS2S938A-GFP shows no significant changes (control: 2.53 ± 0.011 ns; flg22-treated: 2.56 ± 0.013 ns), indicating that Ser-938 phosphorylation influences efficient sorting of FLS2 into AtRem1.3-associated microdomains. Following the suggestion of the reviewer, we now rearranged the order of 2E and 2F, in which N represents the entire image region used for analysis rather than a specific region of interest.

      Question 12: I appreciate the nice results of the FLIM-FRET results for FLS2-Rem1.3. Figure 2H should be supplemented with additional representative images of all FLS2 variants including WT and mutants.

      Reply: Thanks for your warm encouragement. As suggested, we added all the representative images in the revised manuscript.

      Question 13: The unit of the X-axis of Figure 2E can not be pixel. Should it be, um? In the method, the author could specify the camera model and magnification for TIRF-SIM to understand pixel size of the image better.

      Reply: Sorry for the mistake here. Indeed, the unit of the X-axis in Figure 2E should be μm. Now we correct this mistake in Figure 2E in the revised manuscript. Besides, we included a detailed description of the imaging approach of TIRF-SIM in the Materials and Methods section as follows: “The SIM images were taken by a 60 × NA 1.49 objective on a structured illumination microscopy (SIM) platform (DeltaVision OMX SR) with a sCMOS camera (Camera pixel size, 6.5 μm)”.

      Question 14: "... as shown in A..." in Figure Legend 2E should be "... as shown in D..."

      Reply: Thanks for pointing out this mistake. In the revised manuscript, we used “as shown in D” to replace “as shown in A”.

      Question 15: I recommend that the authors exercise caution when drawing conclusions based on the Rem1.3 data and when representing the "microdomain" concept in their final model. While Rem1.3 punctate is a nanometer-sized protein cluster specific to its identity, its shape can be categorized as a nanodomain. Conceptually, however, it neither universally represents all nanodomains nor microdomains, as depicted in Figure 4. We should exercise caution to prevent providing misleading information to the field.

      Reply: We thank the reviewer for expert comments. To avoid misleading conclusions, we changed “nanodomains” to “AtRem1.3-associated microdomains” in the revised manuscript. Besides, we have also made modifications to Figure 4.

      Reviewer #3 (Recommendations For The Authors):

      Question 1: The manuscript needs to be extensively re-written and has severe issues as-is. Many references are either not quite appropriate or are completely unrelated to the use in the text. In general, the current state-of-the-art of PTI and RK signaling is not correctly described or incorporated.

      Reply: We accepted the criticisms here. As suggested, we thoroughly rewrote the manuscript to address the concerns raised. Furthermore, we have thoroughly checked and revised the manuscript by removing 21 irrelevant references and adding 30 relevant references. We also incorporated the most up-to-date descriptions of the PTI and RK signaling pathways.

      Question 2: Receptor-like kinase (RLK) should generally be receptor kinase (RK) as receptor functions are now well established.

      Reply: Yes, we agreed with your expert comment here. Now, we changed “Receptor-like kinase (RLK)” into “receptor kinase (RK)” in the revised manuscript.

      Question 3: Line 20 - is this really true?

      Reply: Sorry for the mistake. In the revised manuscript, we changed “However, the mechanisms underlying the regulation of FLS2 phosphorylation activity at the plasma membrane in response to flg22 remain largely enigmatic.” to “However, the dynamic FLS2 phosphorylation regulation at the plasma membrane in response to flg22 needs further elucidation.”

      Question 4: S938D sorts better in response to Flg22; S938A is unaffected - suggests phosphorylation of S938 is not dynamic in response to Fig 22 but is required for pre-elicitation sorting. Overall, there is a chicken-and-egg problem in this paper: which comes first, immune/signalling functionality or nanodomain sorting? And which is explaining the defects of S938A?

      Reply: We thank the reviewer for expert suggestions. In fact, the previous studies showed that membrane microdomains serve as signaling platforms that mediate cargo protein sorting and protein-protein interactions in a variety of contexts (Goldfinger et al. 2017). Since our previous research showed that the disruption of membrane microdomains affected flg22-induced immune signaling (Cui et al. 2018), we speculate that the immune signal occurred after entering the membrane microdomains.

      As shown in Figure 1 and 2, ligand exposure leads to an increase in diffusion coefficient and enhanced co-localization with REM1.3, both of which are dependent on the phosphorylation of the Ser-938 site. Deducing from these results, we inferred that the defects in S938A resulted largely from its failure to sort into membrane microdomains. The phosphorylation of the Ser-938 site can regulate FLS2 into functional AtRem1.3-associated microdomains, thereby affecting flg22-induced plant immunity.

      Question 5: Line 37 conserved, not conservative (though not technically true - the domain organization is conserved but the ECDs are not conserved).

      Reply: Thank you for pointing this mistake out. In the revised manuscript, we used “conserved” to replace “conservative”.

      Question 6: Lines 40-42 - not all phosphorylation sites are within the kinase domain, for example, sites are well-described on the JM and/or C-tail regions outside of the kinase domain.

      Reply: We accepted the criticisms here. We have corrected the sentence to “with phosphorylation sites mainly located in PKC” in the revised manuscript.

      Question 7: Line 42 - what is BIK1? Intro to relevant topics is severely lacking.

      Reply: Sorry for the incomplete introduction here. We added the relevant introduction of BIK1 by adding that “Upon recognizing flg22, FLS2 interacts with the co-receptor Brassinosteroid-Insensitive 1-associated Kinase 1 (BAK1), initiating phosphorylation events through the activation of receptor-like cytoplasmic kinases (RLCKs) such as BOTRYTIS-INDUCED KINASE 1 (BIK1) to elicit downstream immune responses (Chinchilla et al., 2006; Li et al., 2016b; Majhi et al., 2021). ” in the revised manuscript.

      Question 8: Lines 42-44 - not sure this sequence of events is being properly described (e.g. BIK1 release is unlikely to precede activation by BAK1/SERKs).

      Reply: We apologize for not expressing this sentence clearly. Now, we reworded the sentence: “Upon recognizing flg22, FLS2 interacts with the co-receptor Brassinosteroid-Insensitive 1-associated Kinase 1 (BAK1), initiating phosphorylation events through the activation of receptor-like cytoplasmic kinases (RLCKs) such as BOTRYTIS-INDUCED KINASE 1 (BIK1) to elicit downstream immune responses (Chinchilla et al., 2006; Li et al., 2016b; Majhi et al., 2021).” in the revised manuscript.

      Question 9: Line 61 - S938 was identified by Cao et al (2013) based on in vitro MS, but was functionally validated using genetic assays, not based on MS.

      Reply: Thank you for your comments. Now, we changed the sentence: “In vitro mass spectrometry (MS) identified multiple phosphorylation sites in FLS2. Genetic analysis further identified Ser-938 as a functionally important site for FLS2 in vivo (Cao et al., 2013).” in the revised manuscript.

      Question 10: Line 68-69 - phospho-dead and phospho-mimic, not phosphorylated/non-phosphorylated.

      Reply: We thank the reviewer for expert suggestions. In the revised manuscript, we changed the sentence by replacing “phosphorylated/non-phosphorylated” with “phospho-mimic” and “phospho-dead”.

      Question 11: Lines 104-106 - this is wildly misleading. Flg22 is more than a ligand-like factor, as it is a bona fide ligand, and the heterodimerization with BAK1/SERKs is extremely well-established (and relevant foundational papers should be cited here in place of the authors' previous work).

      Reply: We apologize for the incorrect expression here. After reading the comments, we realized the problem which was raised by the reviewer I as well. Now, we changed “ligand-like factor” to “ligand”. Besides, we cited the new references “(Orosa et al., 2018)” to replace the references of our group in the revised manuscript.

      Question 12: Lines 107-112 - again, this is confusing. There is a decade of (uncited, undiscussed) work previously establishing that heterodimerization of RK-co-receptor complexes is mediated by extracellular ligand binding and independent of intracellular phosphorylation.

      Reply: We thank the reviewer for expert suggestions. Now, we added several sentences in the revised manuscript: “Therefore, we further investigated if Ser-938 phosphorylation affects FLS2/BAK1 heterodimerization. Tesseler segmentation, FRET-FLIM, and smPPI analyses revealed no impact of Ser-938 phosphorylation on FLS2/BAK1 heterodimerization (Figure 2A-C and S2). This aligns with the previous finding that flg22 acts as a molecular glue for FLS2 and BAK1 ectodomains (Sun et al., 2013), confirming the independence of FLS2/BAK1 heterodimerization from phosphorylation, with these events occurring sequentially.”

      Question 13: Line 119 - this is the wrong citation - Yu et al 2020 is a review and does not cover RALFs; correct citation is Gronnier et al 2022 eLife.

      Reply: In the revised manuscript, we updated the reference from “ (Yu et al., 2020)” to “(Gronnier et al., 2022)”.

      Question 14: Lines 123-124 - this sentence is incomplete.

      Reply: Sorry for the incomplete sentence. Now we reworded the sentence to “In a previous investigation, we demonstrated that flg22 induces FLS2 translocation from AtFlot1-negative to AtFlot1-positive nanodomains in the plasma membrane, implying a connection between FLS2 phosphorylation and membrane nanodomain distribution (Cui et al., 2018). To validate this, we assessed the association of FLS2/FLS2S938D/FLS2S938A with membrane microdomains, using AtRem1.3-associated microdomains as representatives (Huang et al., 2019).” in the revised manuscript.

      Question 15: Line 126 - this requires a reference.

      Reply: Yes, we added a new reference: “(Huang et al., 2019)” in the revised manuscript.

      Question 16: Lines 125-128 - should clarify that the authors are not looking at direct interaction between FLS2 and REM1.3.

      Reply: Sorry for the inappropriate expressions here. In the revised manuscript, we reworded the sentence as follows: “To validate this, we assessed the association of FLS2/FLS2S938D/FLS2S938A with membrane microdomains, using AtRem1.3-associated microdomains as representatives (Huang et al., 2019)” .

      Question 17: Line 138 - these are odd references to use for such a broad statement.

      Reply: Now the inappropriate references cited here have been deleted.

      Question 18: Line 161 - incorrect reference, again.

      Reply: Sorry for this mistake. In the revised manuscript, we reworded the sentence and changed the reference.

      Question 19: Lines 160-165 - this is very confusing and misleading. I would suggest just having a short section introducing PTI earlier on (with appropriate references).

      Reply: As suggestion, we reworded and added a section in the revised manuscript as follows: “PTI plays a pivotal role in host defense against pathogenic infections (Lorrai et al., 2021; Ma et al., 2022). Previous studies demonstrated that FLS2 perception of flg22 initiates a complex signaling network with multiple parallel branches, including calcium burst, mitogen-activated protein kinases (MAPKs) activation, callose deposition, and seedling growth inhibition (Baral et al., 2015; Marcec et al., 2021; Huang et al., 2023). Our focus was to investigate the significance of Ser-938 phosphorylation in flg22-induced plant immunity. Figure 4A-F illustrates diverse immune responses in FLS2 and FLS2S938D plants following flg22 treatment. These responses encompass calcium burst activation, MAPKs cascade reaction, callose deposition, hypocotyl growth inhibition, and activation of immune-responsive genes. In contrast, FLS2S938A (Figure S4A-D) exhibited limited immune responses, underscoring the importance of Ser-938 phosphorylation for FLS2-mediated PTI responses”.

      Question 20: Line 166 - these are not appropriate references, again.

      Reply: Thank you for the suggestion. In the revised manuscript, we removed the inappropriate references. Besides, we added new references by citing: “(Baral et al., 2015; Marcec et al., 2021)”.

      Question 21: Lines 169-173 - this is not relevant, the inhibition of growth by elicitors is extremely well-documented (though not by the refs cited here).

      Reply: We reworded the sentence and deleted the inappropriate reference in the revised manuscript.

      Question 22: Lines 174-175 - I don't see why this is unexpected, as nanodomain organization of PRRs has been previously described.

      Reply: Sorry for the inappropriate expressions here. As we have dramatically changed the manuscript, this sentence was deleted from the new version.

      References we added into the revised manuscript

      Baral A, Irani NG, Fujimoto M, Nakano A, Mayor S, Mathew MK. 2015. Salt-induced remodeling of spatially restricted clathrin-independent endocytic pathways in Arabidopsis root. Plant Cell 27:1297-315. DOI: 10.1105/tpc.15.00154, PMID: 25901088

      Barbieri L, Colin-York H, Korobchevskaya K, Li D, Wolfson DL, Karedla N, Schneider F, Ahluwalia BS, Seternes T, Dalmo RA, Dustin ML, Li D, Fritzsche M. 2021. Two-dimensional TIRF-SIM-traction force microscopy (2D TIRF-SIM-TFM). Nature Communications 12:2169. DOI: 10.1038/s41467-021-22377-9, PMID: 33846317

      Bertot L, Grassart A, Lagache T, Nardi G, Basquin C, Olivo-Marin J, Sauvonnet N. 2018. Quantitative and statistical study of the dynamics of clathrin-dependent and -independent endocytosis reveal a differential role of endophilinA2. Cell Reports 22: 1574–1588. DOI:org/10.1016/j.celrep.2018.01.039, PMID: 29425511

      Bücherl CA, Jarsch IK, Schudoma C, Segonzac C, Mbengue M, Robatzek S, MacLean D, Ott T, Zipfel C. 2017. Plant immune and growth receptors share common signalling components but localise to distinct plasma membrane nanodomains. eLife 6:e25114. DOI: https://doi.org/10.7554/eLife.25114, PMID: 28262094

      Chen Y, Munteanu AC, Huang YF, Phillips J, Zhu Z, Mavros M, Tan W. 2009. Mapping receptor density on live cells by using fluorescence correlation spectroscopy. Chemistry 15:5327-36. DOI: https://doi.org/10.1002/chem.200802305, PMID: 19360825

      Chinchilla, D., Bauer, Z., Regenass, M., Boller, T., and Felix, G. 2006. The Arabidopsis receptor kinase FLS2 binds flg22 and determines the specificity of flagellin perception. Plant Cell 18:465-476. doi:10.1105/tpc.105.036574, PMID: 16377758

      Gada KD, Kawano T, Plant LD, Logothetis DE. 2022. An optogenetic tool to recruit individual PKC isozymes to the cell surface and promote specific phosphorylation of membrane proteins. The Journal of Biological Chemistry 298:101893. DOI: https://doi.org/10.1016/j.jbc.2022.101893, PMID: 35367414

      Gronnier J, Franck CM, Stegmann M, DeFalco TA, Abarca A, von Arx M, Dünser K, Lin W, Yang Z, Kleine-Vehn J, Ringli C, Zipfel C. 2022. Regulation of immune receptor kinase plasma membrane nanoscale organization by a plant peptide hormone and its receptors. eLife 11:e74162. DOI: https://doi.org/10.7554/eLife.74162, PMID: 34989334

      Hohmann U, Lau K, Hothorn M. 2017. The structural basis of ligand perception and signal activation by receptor kinases. Annual Review of Plant Biology 68:109–137. DOI: https://doi.org/10.1146/annurev-arplant-042916-040957, PMID: 28125280.

      Huang D, Sun Y, Ma Z, Ke M, Cui Y, Chen Z, Chen C, Ji C, Tran TM, Yang L, Lam SM, Han Y, Shu G, Friml J, Miao Y, Jiang L, Chen X. 2019. Salicylic acid-mediated plasmodesmal closure via Remorin-dependent lipid organization. Proceedings of the National Academy of Sciences 116:21274–21284. DOI: https://doi.org/10.1073/pnas.1911892116, PMID: 31575745

      Huang Y, Cui J, Li M, Yang R, Hu Y, Yu X, Chen Y, Wu Q, Yao H, Yu G, Guo J, Zhang H, Wu S, Cai Y. 2023. Conservation and divergence of flg22, pep1 and nlp20 in activation of immune response and inhibition of root development. Plant Science 331:111686. DOI: https://doi.org/10.1016/j.plantsci.2023.111686, PMID: 36963637

      Jiao C, Gong J, Guo Z, Li S, Zuo Y, Shen Y. 2022. Linalool activates oxidative and calciμm burst and CAM3-ACA8 participates in calciμm recovery in Arabidopsis leaves. International Journal of Molecular Sciences, 23:5357. DOI: https://doi.org/10.3390/ijms23105357, PMID: 35628166

      Kim TJ, Lei L, Seong J, Suh JS, Jang YK, Jung SH, Sun J, Kim DH, Wang Y. 2018. Matrix rigidity-dependent regulation of Ca2+ at plasma membrane microdomains by FAK visualized by fluorescence resonance energy transfer. Advanced science, 6:1801290. DOI: https://doi.org/10.1002/advs.201801290, PMID: 30828523

      Kontaxi C, Kim N, Cousin MA. 2023. The phospho-regulated amphiphysin/endophilin interaction is required for synaptic vesicle endocytosis. Journal of Neurochemistry 166:248–264. DOI: https://doi.org/10.1111/jnc.15848, PMID: 37243578

      Lee Y, Phelps C, Huang T, Mostofian B, Wu L, Zhang Y, Tao K, Chang YH, Stork PJ, Gray JW, Zuckerman DM, Nan X. 2019. High-throughput, single-particle tracking reveals nested membrane domains that dictate KRasG12D diffusion and trafficking. eLife 8:e46393. DOI: https://doi.org/10.7554/eLife.46393, PMID: 31674905

      Li B, Meng X, Shan L, He P. 2016a. Transcriptional regulation of pattern-triggered immunity in plants. Cell Host Microbe 19:641-50. DOI: 10.1016/j.chom.2016.04.011, PMID: 27173932

      Li L, Kim P, Yu L, Cai G, Chen S, Alfano JR, Zhou JM. 2016b. Activation-dependent destruction of a co-receptor by a pseudomonas syringae effector dampens plant immunity. Cell Host Microbe 20:504-514. DOI: https://doi.org/10.1016/j.chom.2016.09.007, PMID: 27736646.b

      Lorrai R, Ferrari S. 2021. Host cell wall damage during pathogen infection: mechanisms of perception and role in plant-pathogen interactions. Plants (Basel) 10:399. DOI: https://doi.org/10.3390/plants10020399, PMID: 33669710

      Marcec MJ, Tanaka K. 2021. Crosstalk between Calcium and ROS signaling during flg22-triggered immune response in Arabidopsis leaves. Plants 11:14. DOI: 10.3390/plants11010014. PMID: 35009017

      Ma M, Wang W, Fei Y, Cheng HY, Song B, Zhou Z, Zhao Y, Zhang X, Li L, Chen S, Wang J, Liang X, Zhou JM. A surface-receptor-coupled G protein regulates plant immunity through nuclear protein kinases. 2022. Cell Host Microbe 30:1602-1614. DOI: 10.1016/j.chom.2022.09.012. Epub 2022 Oct 13. PMID: 36240763.

      Martinière A, Zelazny E. 2021. Membrane nanodomains and transport functions in plant. Plant Physiology 187:1839–1855. DOI: https://doi.org/10.1093/plphys/kiab312, PMID: 35235669

      Majhi, B.B., Sobol, G., Gachie, S., Sreeramulu, S., and Sessa, G. 2021. BRASSINOSTEROID-SIGNALLING KINASES 7 and 8 associate with the FLS2 immune receptor and are required for flg22-induced PTI responses. Molecular Plant Pathology 22:786-799. DOI:https://doi.org/10.1111/mpp.13062, PMID: 33955635

      Mitra SK, Chen R, Dhandaydham M, Wang X, Blackburn RK, Kota U, Goshe MB, Schwartz D, Huber SC, Clouse SD. 2015. An autophosphorylation site database for leucine-rich repeat receptor-like kinases in Arabidopsis thaliana. The Plant Journal 82:1042–1060. DOI: https://doi.org/10.1111/tpj.12863, PMID: 25912465

      Orosa B, Yates G, Verma V, Srivastava AK, Srivastava M, Campanaro A, De Vega D, Fernandes A, Zhang C, Lee J, Bennett MJ, Sadanandom A. 2018. SμmO conjugation to the pattern recognition receptor FLS2 triggers intracellular signalling in plant innate immunity. Nature Communications 9:5185. DOI: https://doi.org/10.1038/s41467-018-07696-8, PMID: 30518761

      Sun Y, Li L, Macho AP, Han Z, Hu Z, Zipfel C, Zhou JM, Chai J. 2013. Structural basis for flg22-induced activation of the Arabidopsis FLS2-BAK1 immune complex. Science 342:624-628. DOI: https://doi.org/10.1126/science.1243825, PMID: 24114786

      Vitrac H, Mallampalli VKPS, Dowhan W. 2019. Importance of phosphorylation/dephosphorylation cycles on lipid-dependent modulation of membrane protein topology by posttranslational phosphorylation. The Journal of Biological Chemistry 294:18853–18862. DOI: https://doi.org/10.1074/jbc.RA119.010785, PMID: 31645436

      Xue Y, Xing J, Wan Y, Lv X, Fan L, Zhang Y, Song K, Wang L, Wang X, Deng X, Baluška F, Christie JM, Lin J. 2018. Arabidopsis blue light receptor phototropin 1 undergoes blue light-induced activation in membrane microdomains. Molecular Plant 11:846-859. DOI: 10.1016/j.molp.2018.04.003, PMID: 29689384

      Xing J, Ji D, Duan Z, Chen T, Luo X. 2022. Spatiotemporal dynamics of FERONIA reveal alternative endocytic pathways in response to flg22 elicitor stimuli. New Phytologist 235: 518-532. DOI: 10.1111/nph.18127, PMID: 35358335

      Zhai K, Liang D, Li H, Jiao F, Yan B, Liu J, Lei Z, Huang L, Gong X, Wang X, Miao J, Wang Y, Liu JY, Zhang L, Wang E, Deng Y, Wen CK, Guo H, Han B, He Z. 2021. NLRs guard metabolism to coordinate pattern- and effector-triggered immunity. Nature 601:245-251. DOI: https://doi.org/10.1038/s41586-021-04219-2, PMID: 34912119

      Zhong YH, Guo ZJ, Wei MY, Wang JC, Song SW, Chi BJ, Zhang YC, Liu JW, Li J, Zhu XY, Tang HC, Song LY, Xu CQ, Zheng HL. 2023. Hydrogen sulfide upregulates the alternative respiratory pathway in mangrove plant Avicennia marina to attenuate waterlogging-induced oxidative stress and mitochondrial damage in a calciμm-dependent manner. Plant Cell and Environment 46:1521-1539. DOI: https://doi.org/10.1111/pce.14546, PMID: 36658747

      Inappropriate references we deleted from the revised manuscript

      Schulze S, Yu L, Hua C, Zhang L, Kolb D, Weber H, Ehinger A, Saile SC, Stahl M, Franz-Wachtel M, Li L, El Kasmi F, Nürnberger T, Cevik V, Kemmerling B. 2022. The Arabidopsis TIR-NBS-LRR protein CSA1 guards BAK1-BIR3 homeostasis and mediates convergence of pattern- and effector-induced immune responses. Cell Host Microbe 30:1717-1731.e6. DOI: 10.1016/j.chom.2022.11.001, PMID: 36446350

      Wang Q, Zhao Y, Luo W, Li R, He Q, Fang X, Michele RD, Ast C, von Wirén N, Lin J. 2013. Single-particle analysis reveals shutoff control of the Arabidopsis ammonium transporter AMT1;3 by clustering and internalization. Proceedings of the National Academy of Sciences of the United States of America 110:13204-9. DOI: 10.1073/pnas.1301160110, PMID: 23882074

      Eichel K, Jullié D, von Zastrow M. β-Arrestin drives MAP kinase signalling from clathrin-coated structures after GPCR dissociation. Nature Cell Biology 18:303-10. DOI: 10.1038/ncb3307, PMID: 26829388

      Van Itallie CM, Anderson JM. Phosphorylation of tight junction transmembrane proteins: Many sites, much to do. Tissue Barriers 6:e1382671. DOI: 10.1080/21688370.2017.1382671, PMID: 29083946

      Monje-Galvan V, Warburton L, Klauda JB. Setting up all-atom molecular dynamics simulations to study the interactions of peripheral membrane proteins with model lipid bilayers. Methods in Molecular Biology 1949:325-339. DOI: 10.1007/978-1-4939-9136-5_22, PMID: 30790265.

      Trotta A, Bajwa AA, Mancini I, Paakkarinen V, Pribil M, Aro EM. The role of phosphorylation dynamics of CURVATURE THYLAKOID 1B in plant thylakoid membranes. Plant Physiology 181:1615-1631. DOI: 10.1104/pp.19.00942, PMID: 31615849

      Dorrity MW, Saunders LM, Queitsch C, Fields S, Trapnell C. Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nature Communications 11:1537. DOI: 10.1038/s41467-020-15351-4, PMID: 32210240

      Sato KI, Tokmakov AA. Membrane microdomains as platform to study membrane-associated events during Oogenesis, Meiotic Maturation, and Fertilization in Xenopus laevis. Methods in Molecular Biology 920:59-73. DOI: 10.1007/978-1-4939-9009-2_5, PMID: 30737686.

      Ozolina NV, Kapustina IS, Gurina VV, Bobkova VA, Nurminsky VN. Role of plasmalemma microdomains (Rafts) in protection of the plant cell under Osmotic stress. Journal of Membrane Biology 254:429-439. DOI: 10.1007/s00232-021-00194-x, PMID: 34302495

      Boutté Y, Moreau P. Plasma membrane partitioning: from macro-domains to new views on plasmodesmata. Frontiers in Plant Science 5:128. DOI: 10.3389/fpls.2014.00128. PMID: 24772114

      Yu M, Cui Y, Zhang X, Li R, Lin J. Organization and dynamics of functional plant membrane microdomains. Cellular and Molecular Life Sciences 77:275-287. DOI: 10.1007/s00018-019-03270-7, PMID: 31422442

      Zhao Z, Li M, Zhang H, Yu Y, Ma L, Wang W, Fan Y, Huang N, Wang X, Liu K, Dong S, Tang H, Wang J, Zhang H, Bao Y. Comparative proteomic analysis of plasma membrane proteins in rice leaves reveals a vesicle trafficking network in plant immunity that is provoked by Blast Fungi. Frontiers in Plant Science 13:853195. DOI: 10.3389/fpls.2022.853195, PMID: 35548300

      Hilgemann DW, Dai G, Collins A, Lariccia V, Magi S, Deisl C, Fine M. Lipid signaling to membrane proteins: From second messengers to membrane domains and adapter-free endocytosis. Journal of General Physiology 150:211-224. DOI: 10.1085/jgp.201711875, PMID: 29326133

      Joshi R, Paul M, Kumar A, Pandey D. Role of calreticulin in biotic and abiotic stress signalling and tolerance mechanisms in plants. Gene 714:144004. DOI: 10.1016/j.gene.2019.144004, PMID: 31351124

      Chen Y, Cao C, Guo Z, Zhang Q, Li S, Zhang X, Gong J, Shen Y. Herbivore exposure alters ion fluxes and improves salt tolerance in a desert shrub. Plant Cell and Environment 43:400-419. DOI: 10.1111/pce.13662, PMID: 31674033

      Chi Y, Wang C, Wang M, Wan D, Huang F, Jiang Z, Crawford BM, Vo-Dinh T, Yuan F, Wu F, Pei ZM. Flg22-induced Ca2+ increases undergo desensitization and resensitization. Plant Cell and Environment 44:3563-3575. DOI: 10.1111/pce.14186, PMID: 34536020

      Zhang M, Su J, Zhang Y, Xu J, Zhang S. Conveying endogenous and exogenous signals: MAPK cascades in plant growth and defense. Current Opinion in Plant Biology 45:1-10. DOI: 10.1016/j.pbi.2018.04.012, PMID: 29753266

      Arnaud D, Deeks MJ, Smirnoff N. RBOHF activates stomatal immunity by modulating both reactive oxygen species and apoplastic pH dynamics in Arabidopsis. Plant Journal 116:404-415. DOI: 10.1111/tpj.16380, PMID: 37421599

      Zou Y, Wang S, Zhou Y, Bai J, Huang G, Liu X, Zhang Y, Tang D, Lu D. Transcriptional regulation of the immune receptor FLS2 controls the ontogeny of plant innate immunity. Plant Cell.30:2779-2794. DOI: 10.1105/tpc.18.00297, PMID: 30337428

      Ngou BPM, Jones JDG, Ding P. Plant immune networks. Trends in Plant Science 27:255-273. DOI: 10.1016/j.tplants.2021.08.012, PMID: 34548213.

      Yu M, Liu H, Dong Z, Xiao J, Su B, Fan L, Komis G, Šamaj J, Lin J, Li R. 2017. The dynamics and endocytosis of Flot1 protein in response to flg22 in Arabidopsis. Journal of Plant Physiology 215:73–84. DOI: https://doi.org/10.1016/j.jplph.2017.05.010, PMID: 28582732

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their insightful and constructive comments of our work that have helped to strengthen the manuscript. In response to the additional suggestions provided by the reviewers, we have made revisions by adding or replacing five main figures, three supplementary figures, refining the text, and clarifying certain conclusions. Detailed responses to the reviewers’ points can be found below.

      Additional experiments, textual changes, or modulation of claims are needed to address weaknesses in the SOD1 portion of the study. Specifically:

      A) These studies require an assessment of the on-target efficacy of the inhibitors at the relevant concentration ranges. Ideally, they should have minimal effects against SOD1 knockout cell lines (an acute challenge at a time point before the growth defects become apparent) and show better efficacy in SOD1-overexpressing lines. Key experiments (changes in superoxide, OCR profiling, DNA alkaline comet assay) would be more convincing if they were carried out with SOD1 knockout lines to compare against the inhibitor effects (3-4 days after introducing sgSOD1 when growth defects are not apparent). In addition, SOD activity should be measured directly following inhibitor treatment.

      We agree with the reviewers that the on- vs. off-target effects of the pharmacologic SOD1 inhibitors is a critical point to address. We have validated that SOD activity is reduced following treatment with ATN-224 in Figure 2 – Figure supplement 1A.

      Nevertheless, we acknowledge that the potential for off-target effects of these inhibitors cannot be completely ruled out. To address this concern, we have incorporated a discussion regarding the potential off-target effects of both LCS-1 and ATN-224.

      B) Assays should be included to support that SOD1 activity is altered. ATN-224 and LCS-1 are used to inhibit SOD1 function in the majority of the experiments, which should be supported by SOD activity assays to confirm SOD inhibition. Further, the concentration of ATN-224 used in this paper (12.5 uM) is beyond the concentration of what has been reported to inhibit SOD1 function in human blood cells. In Figure 4D, the authors demonstrate comparable SOD1 total protein levels in WT and PPM1Dmutant cells. However, the authors should further address whether PPM1D-mutation alters SOD1 activity via SOD activity assays.

      We thank the reviewers for these suggestions. We have performed SOD activity assays which confirmed that SOD activity is inhibited upon treatment with ATN-224 at two concentrations (6.25 and 12.5 uM). Although we also did this for LCS-1-treated cells as well, in our hands, we did not see reduced SOD activity. However, LCS-1 has been shown to inhibit SOD activity in other publications including PMID: 21930909 and PMID: 32424294. From these assays, we have also found that PPM1D-mutant cells had increased SOD activity at baseline, despite having similar levels of SOD1 protein. These data have been added to Figure 2–Figure supplement 1A.

      C) Some conclusions are not fully supported by the data provided. The authors claimed that "upon inhibition of SOD1, there was an increase in ROS that was specific to the mutant cells" in Figure 2E. Comparison of ROS levels among untreated, ATN-224, and LCS-1 of PPM1D-mutant cells should have been made and the statistics analysis among these groups should have been provided. Moreover, in Figure 2-Figure Supplement 1E, LCS-1 treatment does not increase ROS levels in PPM1D mutant LCLs. Performing these experiments with control and SOD1 deletion cells would have strengthened the results. Along with this point, the authors should comment on why SOD2 is not identified as a top hit in the CRISPR screen, as SOD2 deletion accumulates superoxide in cells.

      After performing additional statistical analyses for Figure 2E, we found that the minor increase in ROS levels in the mutant cells after SOD1 inhibition was not statistically significant. We have revised the text accordingly.

      As for why SOD2 was not identified as a top hit, we postulate that this may be due to inherent dependency of the WT cell lines on SOD2.

      D) Fig. 1 - SOD1 appears to be clustered with several other genes in the volcano plot (including FANC proteins). Did any other ROS-detoxifying enzymes show similar fitness scores? The effects of the SOD1 sgRNA are striking, however, it would be useful to see qPCR or immunoblot data confirming robust depletion.

      Thank you for your suggestion. We have validated the loss of SOD1 protein expression after SOD1 sgRNA deletion by immunoblot and have added this data to Figure 1– figure supplement 1D. While other ROS-detoxifying enzymes were not significantly enriched in the top 37 hits, interestingly, the Fanconi Anemia pathway also has roles in counteracting oxidative stress. FA-deficient cells have mitochondrial dysfunction and redox imbalance, and several of the FA family proteins are implicated in mitophagy. Therefore, there may be an interesting interplay between SOD1 and the FA pathway that is worth highlighting in the discussion of our manuscript even though there was no experimental investigation performed.

      E) Fig. 2 - What are the relative SOD1 levels in the mutant PPM1D vs. WT. cell lines? The effects of the chemical inhibitors are stronger in MOLM-13 than in the other two lines. These data could also point to whether LCS-1 and ATN-224 cytotoxicity are on-target or off-target at these concentrations, which is a key issue not currently addressed in these studies. This is a particular concern as the OCI-AML2 line shows a stronger growth defect with CRISPR SOD1 KO (in Fig 1) but the smallest effects with these chemical inhibitors. The authors should also include SOD1 levels for Figure 1D and Figure 4Figure supplement 1C.

      SOD1 protein expression is similar between WT and PPM1D-mutant cell lines and the loss of SOD1 after SOD1 sgRNA deletion was validated by immunoblot. These data have been added to Figure 1- figure supplement 1D and Figure 4D.

      F) Does SOD1 co-expression in PPM1D-mutant patient AML correspond to poorer disease outcomes? This can be evaluated in publicly available patient datasets and would support the idea of SOD1 synthetic lethality.

      Unfortunately, there are no publicly available patient datasets with sufficient cases of de novo PPMDmutant AML to assess this question.

      G) While endogenous mitochondrial superoxide levels are elevated in PPM1D mutant lines, it is entirely unclear why SOD1 inhibition should affect mitochondrial superoxide as it detoxifies cytosolic superoxide. Also unclear why the DCFDA signal (which measures total hydroperoxides) is increased under SOD1 inhibition - SOD1 dismutates superoxide radicals into hydrogen peroxide, therefore unless SOD2 is compensating for SOD1 loss, one might expect hydroperoxides to be lower (unless some entirely different oxidase is increasing their levels). None of these outcomes appear to be considered. Finally, it is not explained how lipid peroxidation, which requires the production of hydroxyl or similarly high-potency radicals, is being caused by increased superoxide or peroxides. One possibility is there is an increase in labile iron, in which case this phenotype would be rescued by the iron chelator desferal, and by the lipophilic antioxidant, ferrostatin.

      We measured intracellular labile iron levels by flow cytometry by staining the cells with FerroOrange at baseline and after SOD1 inhibition with our pharmacologic inhibitors (ATN-224 at 12.5 uM and LCS-1 at 1.25 uM). Across the three leukemia cell lines, we saw variable results in iron levels with no appreciable patterns (see below). Therefore, we cannot make conclusions about the contribution of labile iron to our observed phenotypes.

      Author response image 1.

      H) Do the sgSOD1 cells also show similar increases in MitoSox green, DCFDA, and BODIPY signal? These experiments would clarify whether the effects of the inhibitors are directly related directly to SOD1 loss or if they represent off-target effects from the inhibitors and/or compensatory changes in SOD2.

      We do not observe changes in SOD2 in the several contexts in which we have examined this. We cannot exclude off-target effects of the inhibitors so have clarified this in the text.

      I) The authors may want to assess whether Rac1 or NADPH oxidase activity is altered in the SOD1 KO in WT vs. PPM1D cells. Their results may be the consequence of compromised ROS-driven survival signaling or DNA repair rather than direct ROS-induced damage, which is not caused directly by superoxide (or hydrogen peroxide).

      We appreciate the reviewer’s recommendations. However, due to time constraints, we regret not being able to assess Rac1 or NADPH oxidase activity. Nevertheless, we recognize the possibility of altered ROS-driven signaling rather than ROS-induced damage as a driver of our phenotype and have incorporated this possibility into our discussion.

      J) Fig. 3 - the effects on mitochondrial respiratory parameters, while statistically significant, do not seem biologically striking. Also, these data are shown for OCI-AML2 cells which show the smallest cytotoxic effects with the SOD1 inhibitors among the 3 lines tested. They do however show the most robust growth defect with sgSOD1. This discrepancy could suggest that mitochondrial dysfunction does not underlie the observed growth defect and/or the inhibitor cytotoxicity is not on-target. Ideally, mitochondrial profiling should also be carried out on this cell line with inducible SOD1 depletion. Have the authors assessed whether the mitochondrial Bcl family proteins are affected by the inhibitors?

      We assessed a few members of the mitochondrial Bcl-family proteins including MCL-1, BCL-2, and BCL-XL during the revision process. PPM1D-mutant cells have mildly increased expression of these anti-apoptotic proteins at baseline and the expression is not altered by pharmacologic SOD1 inhibition (see Author response image 2 below). Due to time constraints, we were unable to perform seahorse assays and mitochondrial profiling in the SOD1-deletion cells.

      Author response image 2.

      K) Fig. 4 - Currently the data in this figure do not support the authors' claim that PPM1D-mutant cells have impaired antioxidant defense mechanisms, leading to an elevation in ROS levels and reliance on SOD1 for protection. It should be noted that oxidative stress specifically refers to adverse cellular effects of increasing ROS, not baseline levels of various redox parameters. Ideally, levels of GSSG/GSH would be a better measure of potential redox stress tolerance than the total antioxidant capacity assay. Finally, oxidative stress can be assessed by challenging the wt and mutant PPM1D cell lines with oxidant stressors such as paraquat which elevates superoxide, or drugs like erastin which elevate mitochondrial ROS. The immunoblot shows negligible changes in the antioxidant proteins assayed. Again, this blot should include SOD2 which is the most relevant antioxidant in the context of mitochondrial superoxide.

      We measured intracellular glutathione levels by flow cytometry and found that PPM1D-mutant cells had a greater proportion of cells with low levels of GSH. This data has been added as Figure 4D. We have also repeated the western blot to look at the antioxidant proteins catalase, SOD1, and thioredoxin after SOD1-deletion and pharmacologic SOD1 inhibition. We evaluated SOD2 protein levels in these experiments, as suggested. Smooth muscle actin (SMA) is included in the antibody cocktail as a loading control. However, it is unclear to us as to why PPM1D-mutant cells consistently have significantly higher levels of SMA. Therefore, we included a separate loading control, Vinculin. Repeat of these western blots showed a clearer difference between WT and PPM1D-mutant cells in the levels of these antioxidant proteins in which PPM1D-mutant cells have decreased levels of catalase and thioredoxin. These blots also show that SOD2 levels may be mildly increased in the PPM1D-mutant cells at baseline but is not significantly upregulated upon SOD1 inhibition. We have replaced the original immunoblot from Figure 4D with the revised blots that more clearly demonstrate the reduced levels of catalase and thioredoxin, now figure 4E.

      L) Fig. 5 - These data support that DNA breaks are elevated in PPM1D mutant vs. wt cells. However, the data with the chemical SOD1 inhibitor again do not convince us that the enhanced levels are due to on-target effects on SOD1. Use of the alkaline comet assay is appropriate for these studies and the 8-oxoguanine data do indicate contributions from oxidative DNA base damage. But these are unlikely to result directly from altered superoxide levels, as this species cannot directly oxidize DNA bases or cause DNA strand breaks.

      Thank you to the reviewers for raising this point. We have performed comet assays in SOD1-deletion cells to look at levels of DNA damage. Consistent with the reviewers’ point, we do not see a significant increase in DNA breaks after SOD1 deletion. We have removed the data using the SOD1 inhibitor and instead show the COMET analysis in the PPM1D-mut and SOD1-KO cells (see Figure 5F). We now make the point that increased DNA damage with SOD1 loss cannot explain the vulnerability of the double-mutant cells.

      M) Instead of using NAC, which elevates glutathione synthesis but also has several known side effects, the authors may want to determine whether Tempol, a SOD mimetic can rescue the effects of SOD1 knockout or inhibition. This would directly prove that SOD1 functional loss underlies the observed growth defect and cytotoxicity from genetic SOD1 knockdown or chemical inhibition.

      This is an excellent suggestion; we have added comments to this effect into the discussion.

      N) It is recommended the discussion focus more strongly on how the signaling function of superoxide vs. its reactions with other molecular entities to induce genotoxic outcomes could be contributing to the observed phenotypes. The discussion of FANC proteins, which were targets with similar fitness scores but not experimentally investigated at all, is an unwarranted digression.

      Thank you for this recommendation. We have expanded the discussion to focus more on the signaling functions of superoxide. However, considering the role of the Fanconi Anemia pathway in mitigating DNA damage and oxidative stress, we believe the discussion on the FANC proteins is important due to the possible intersection with SOD1. Therefore, we have refined this portion discussion to focus more on the interplay between SOD1 and FA.

      O) The complete lack of consideration of SOD2 in these studies is a missed opportunity as it reduces mitochondrial superoxide levels but elevates hydrogen peroxide levels. It would be very interesting to see whether SOD1 inhibition leads to compensatory increases in SOD2. SOD2 can be easily measured by immunoblot. Furthermore, measuring total superoxide via hydroethidium in a flow cytometric assay vs. mitochondrial ROS in PPM1D mut vs. wt cells and under SOD1 knockout would enable a determination of which species dominates (cytosolic or mitochondrial). These experiments are required to fill some logical gaps in the interpretation of their redox data.

      During the revision process, we have included SOD2 in our studies and have found that loss of SOD1 via genetic deletion and pharmacologic inhibition does not lead to compensatory increases in SOD2 (Figure 4D). Additionally, we have measured cytoplasmic superoxide levels using dihydroethidium to differentiate between cytoplasmic vs. mitochondrial superoxide. We found that at baseline levels, the mutant cells also harbored more cytoplasmic superoxide. We have added this figure as Figure 2C and moved the original mitochondrial superoxide data to Figure 2-figure supplement 1C.

      P) Given the DNA breaks observed in PPM1D mutant cells, it is highly recommended that the authors assess whether iron levels are elevated in mut vs. wt cells and whether desferal can rescue observed SOD1 inhibition defects. Also, it has been reported that PPM1D promotes homologous recombination by forming a stable complex with BRCA1-BARD1, thereby enhancing their recruitment to doublestrand break sites. The authors should comment on why there is no difference in repair via HR in WT and PPM1D mutant cells in Figure 5C.

      Please see comment G regarding our findings about iron levels.

      The reviewers pose an interesting question as to why there is no difference in HR repair between WT and mutant cells, given the reported role of PPM1D in promoting HR. We have addressed this question in the main text. We believe that several factors can limit the extent of HR enhancement in PPM1D-mutant cells. For example, HR is typically confined to the S/G2 phase and thus may be constrained by cell cycling, among other regulatory mechanisms.

      Other comments:

      A) The authors described in the Method section that "The CRISPR Screen PPM1D mutant Cas9expressing OCI-AML2 cell lines were transduced with lentivirus library supernatant." The authors need to provide information on whether the MOI of the CRISPR screen has been well controlled to ensure that the majority of the cell population has a single copy of sgRNA transduction.

      We performed a lentiviral titer curve prior to the screen to determine the volume of viral supernatant to add for a multiplicity of infection (MOI) of 0.3. This important detail has been added to our Methods.

      B) The study convincingly shows differences between parental leukemic cells and the PPM1D mutants but one important control is missing in experiments related to Fig. 2 and 3. All PPM1D mutant clones used in this study were subjected to the blasticidin selection of the transduced cells to generate cells stably expressing Cas9 and subsequently, the clones with successful PPM1D targeting were expanded. The authors should demonstrate that increased ROS production is not just a consequence of the lentiviral transduction and antibiotic selection and that it corresponds to increased PPM1D activity in PPM1D mutant cells. To do that, authors could compare PPM1D clones to parental cells that underwent the same selection procedure (OCI-AML2-Cas9 cells and OCI-AML3-Cas9 cells).

      It is true that the parental OCI-AML2 and OCI-AML3 cell lines underwent four days of blasticidin selection to create the stably expressing Cas9 cell lines. However, after the four-day period, the blasticidin was removed from the cell culture media. From there, we induced the PPM1D-mutations into the Cas9-expressing “WT” cell lines using the RNP-based CRISPR/Cas9 delivery method and single cells were then sorted into 96-well plates. Clones were expanded and validated using Sanger sequencing, TIDE analysis, and western blot. In all of our assays, we compare the WT Cas9 cells to the PPM1D-mutant Cas9 cells. Additionally, the cells have been expanded and passaged several times after blasticidin-selection. Therefore, we believe it is unlikely that there are residual ROSinducing effects from the antibiotic treatment.

      C) The authors mention that they identified 3530 genes differentially expressed in parental and PPM1D mutant cells (line 267) but it is unclear what was the threshold for statistical significance. They mention FDR<0.05 in the Methods but show GSEA analysis with FDR<0.25 in Figure 4A. Source data for Fig. 4 is missing and the list of differentially expressed genes is not shown.

      The source data files for Figures 1 and 4 will be uploaded with the revised manuscript. Upon reviewing the source data, we noticed an error in the number of differentially expressed genes. We have corrected this in line 274 and you will see that this correlates with Figure 4-source data 1. For the thresholds, we used an FDR<0.05 for the differential gene expression analysis, and an FDR <0.25 in the GSEA, which is an appropriate threshold for GSEA. We have clarified these thresholds in the methods section.

      D) Include a definition of MFI in Figure legend Fig.2 and also in the Methods section. The unit should be indicated at both the x and y axes.

      We have defined MFI in the figure legends and methods sections and have updated the figures accordingly.

      E) Legend to Figure 2 - Figure Supplement 1 E should define the grey and pink columns (likely WT and mutants LCLs).

      Thank you. We have defined the grey and pink columns as WT and PPM1D-mutant cell lines, respectively for Figure 2 – Figure supplement 2D and E.

      F) Reporter assays in Fig. 5 convincingly show that NHEJ capacity is reduced in PPM1D mut cells. In the text, the authors state that this might reflect the impact of PPM1D on LSD1 (line 365). Although this might be the case, other options are equally possible. It would be appropriate to include a reference to the ability of PPM1D to counteract gH2AX and ATM which generate the most upstream signals in DDR.

      Thank you to the reviewers for raising this excellent point. We have revised the text to incorporate the impact of PPM1D on yH2AX and ATM on NHEJ.

      G) The authors correctly state that truncation of PPM1D leads to protein stabilization (line 85) and that it is present in U2OS cells (line 355). These observations have first been reported by Kleiblova et al 2013 and therefore one reviewer believes that this reference should be included. This study also identified truncating PPM1D mutation in colon adenocarcinoma. HCT116 cells and the role of PPM1D mutation in promoting the growth of colon cancer has subsequently been tested in an animal model (Burocziova et al., 2019).

      Thank you. We have added this reference to our text in line 360.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Response to Reviewer 1:

      • We agree with the reviewer’s overall assessment of this manuscript.

      • Because multiple secreted proteins are changed between the control and experimental groups, some of them could be causal and others corelative in the context of enhancing compensatory glucose production in response to elevated glycosuria. Through future studies we will determine the causal factors that trigger the increase in glucose production.

      • Yes, we will correct the typographical errors in a revised version of this manuscript.

      Response to Reviewer 2:

      • We agree with reviewer on their comment about potential sex differences we may have missed in this study. Therefore, we will include this limitation in discussion section of a revised manuscript.

      • The reviewer’s statement ‘The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout’ is incorrect. In the referred publication, we had explicitly mentioned in methods that ‘All of the experiments, except those using a diet-induced obesity mouse model or noted otherwise, were completed within 14 days of inducing the Glut2 deficiency.’ Please see figures 5h-l and 6 in that previous publication, which demonstrate that all the experiments were not completed within 14 days of inducing renal Glut2 deficiency. Per the reviewer’s advice, in the present manuscript we will include the timeline of the experiments (which in some cases is 4 months beyond inducing glycosuria) with all the figure legends. In addition, for a separate project (which is unpublished) we have measured glycosuria up to 1 year after inducing renal Glut2 deficiency. Therefore, the glycosuria observed in the renal Glut2 KO mice is not temporary.

      • In our previous response to the reviewer, we had already mentioned which control group was used in this study. Please see our response to the second reviewer’s point 3. As mentioned to the reviewer, we had used Glut2-loxp/loxp mice as the control group, which is also described multiple times in the figure legends of our previous paper that reported the phenotype of renal Glut2 KO mice and is cited in this manuscript so we don’t have to repeat the same information. Per the reviewer’s advice, we will also include the information in a revised version of this manuscript.

      • We request the reviewer to look at figure 1, showing an increase in glucose production in renal Glut2 KO mice and figure 3, which demonstrates that an afferent renal denervation reduces blood glucose levels by 50%. The afferent renal denervation (ablation of afferent renal nerves) does reduce blood glucose levels in renal Glut2 KO mice. Therefore, the use of the word ‘promote’ in the title is accurate and appropriate to reflect the role of the afferent renal nerves in contributing to about 50% increase in blood glucose levels in renal Glut2 KO mice. Regarding the reviewer's comment on changes in Crh gene expression, please look at figure 3. Ablation of renal afferent nerves decreases hypothalamic Crh gene expression and other mediators of the HPA axis by 50%. Therefore, the afferent renal nerves do contribute to regulating blood glucose levels, at least in part, by the HPA axis (which is widely known to change blood glucose levels). The use of words such as ‘required’ or ‘necessary’ in the title may have indicated causal role or could have been misleading here; therefore we have purposely used ‘promote’ in the title to accurately reflect the findings of this study.

      • Because we observed an increase in hepatic glucose production in renal Glut2 KO mice (Fig. 1) - which was reduced by 50% after selective afferent renal denervation (Fig. 3) - in the graphical abstract we are suggesting a neural connection between the kidney-brain-liver or an endocrine factor(s) to account for these changes in blood glucose levels as also described in the discussion section. We can include a question mark ‘?’ in the graphical abstract to show that further studies are need to validate these proposed mechanisms; however, we cannot just remove the arrow as advised by the reviewer.

      • Per the reviewer’s advice, in the methods we will include the dilutions used for each assay.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      It would be helpful to the reader to specify in Figure 1a-c whether data were directly measured or calculated.

      We have now clarified this in method section of the revised manuscript. The glucose production was directly measured and then fractional contribution of the tissues was calculated from the former data. We have also included a reference research paper to further clarify the method.

      The methods section would be strengthened by clarifying the order in which experiments were performed, the age of the mice at each time point, and whether different cohorts were used for different techniques.

      We have included additional details in the method section with proper citations. For in-depth protocols we have cited our previous publications.

      It would be helpful to explain or provide a reference for how the post-mortem background activity measurement was performed.

      We have included this explanation in the revised manuscript.

      Similarly, details regarding the collection of blood for ACTH and corticosterone measurement are needed for the reader to evaluate whether the results are confounded by stress at the time of collection.

      We have added these details in the method section.

      I recommend stating, if accurate, that you used mixed-sex groups because your previous study found no sex differences in the phenotype of renal Glut2 KO mice.

      Yes, we have included these details in the revised manuscript.

      Sentence 239 is difficult to follow. Also, line 287 contains a contraction.

      We have revised the sentence per the reviewer’s advice.

      A graphical abstract would be helpful, bearing in mind conclusive vs suggestive findings.

      Yes, we have included the graphical abstract with the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments to the Authors

      (1) The Methods also need to specify more of the critical details of the ELISAs, including the dilution factors used, and whether the values reported are dilution-corrected. Also, there is no description of how insulin was measured.

      We have included these details in the method section. The assay dilutions were performed per manufacturers’ instructions.

      (2) The Methods do not sufficiently describe how Crh mRNA was quantified in the hypothalamus. Presumably, they examined only the paraventricular nucleus? How many sections were used for in situ hybridization? How were the brains processed? What thickness of section was used? When were the brains collected?

      We have included these details in the method section and cited our previous publications for in-depth protocols. Some of the information is also available in the figure legends.

      (3) The number of mice that were used for plasma proteomics is not indicated.

      The number of mice is indicated using individual symbols or points presented on the bar graphs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study addresses the long-term effect of warming and altered precipitation on microbial growth, as a proxy for understanding the impact of global warming. While the methods are compelling and the evidence supporting the claims is solid, additional analysis of the data would strengthen the study, which should be of broad interest to microbial ecologists and microbiologists.

      We sincerely appreciate your assessment and thoughtful comments, which are valuable and very helpful for improving our manuscript. We have carefully considered all comments, and made extensive, thorough corrections and additional analysis of the data, which we hope to meet with approval.

      Reviewer #1 (Public Review):

      Warming and precipitation regime change significantly influences both above-ground and below-ground processes across Earth's ecosystems. Soil microbial communities, which underpin the biogeochemical processes that often shape ecosystem function, are no exception to this, and although research shows they can adapt to this warming, population dynamics and ecophysiological responses to these disturbances are not currently known. The Qinghai-Tibet Plateau, the Third Pole of the Earth, is considered among the most sensitive ecosystems to climate change. The manuscript described an integrated, trait-based understanding of these dynamics with the qSIP data. The experimental design and methods appear to be of sufficient quality. The data and analyses are of great value to the larger microbial ecological community and may help advance our understanding of how microbial systems will respond to global change. There are very few studies in which the growth rates of bacterial populations from multifactorial manipulation experiments on the Qinghai-Tibet Plateau have been investigated via qSIP, and the large quantity of data that comprises the study described in this manuscript, will substantially advance our knowledge of bacterial responses to warming and precipitation manipulations.

      We appreciate the encouragement and positive comments.

      Specific comments:

      (1) Please add some names of microbial groups with most common for the growth rates.

      We have added the sentence “The members in Solirubrobacter and Pseudonocardia genera had high growth rates under changed climate regimes” In the Abstract (Line 57-58).

      (2) L47-48, consider changing "microbial growth and death" to "microbial eco-physiological processes (e.g., growth and death)", and changing "such eco-physiological traits" to "such processes".

      Done (Line 47 and 48).

      (3) L50-51, the author estimated bacterial growth in alpine meadow soils of the Tibetan Plateau after warming and altered precipitation manipulation in situ. Actually, the soil samples were collected and incubated in the laboratory rather than in the field like the previous experiment conducted by Purcell et al. (2021, Global Change Biology). "In situ" would lead me to believe that the qSIP incubation was conducted in the field, so I think the use of the word in situ is inappropriate here. [https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15911]

      Agreed. We have deleted “in situ”.

      (4) L52, what does "interactive global change factors" mean?

      We have revised this sentence to “the growth of major taxa was suppressed by the single and combined effects of temperature and precipitation” (Line 52-53).

      (5) L61, in my opinion, "Microbial diversity" belongs to the category of species composition, rather than ecosystem functional services. Please revise it.

      Agree. We have deleted it.

      (6) L69, consider changing "further" to "thus".

      Done (Line 70).

      (7) L82, delete "The evidence is overwhelming that".

      Done.

      (8) L85-90, these two sentences have similar meanings, please express them concisely.

      We have deleted the sentence “Altered precipitation, particularly drought or heavy precipitation events, also tends to negatively influence soil processes and biodiversity”.

      (9) L91, the effect of drought on soil microorganisms is lacking here.

      We have added the sentence “Reduced precipitation affects soil processes notably by directly stressing soil organisms, and also altering the supply of substrates to microbes via dissolution, diffusion, and transport” in the Introduction (Line87-89).

      (10) L102, "Growth" should be highlighted here, as changes in relative abundance can also be classified as population dynamics. The use of the term "population dynamics" will eliminate the highlight of this study in calculating the growth rate of microbial species in in-situ soil based on qSIP. Consider changing "population dynamics" to "population-growth responses" or something like that.

      Done (Line 98).

      (11) L105, please note that this citation focuses on plant physiological characteristics.

      We have revised the reference (Line 102).

      (12) L115, "soil temperature, water availability" should be considered as a direct impact of climate change, rather than an indirect impact on microorganisms.

      We have deleted them.

      (13) L134-135, please clarify the interaction types between which climate factors.

      We have deleted this sentence.

      (14) L135-138, suggest modifying or deleting this sentence. The results in this study are already eco-physiological data and do not need to be further "understood and predicted".

      We have deleted this sentence.

      (15) L150, "The experimental design has been described in previously". I think this refers to another study and not the actual incubations in this study. Also in L198, suggest a change to "Incubation conditions were similar to those previously described". So, it's clear it's not the same experiment.

      We have revised these sentences to “has been described previously in (Ma et al., 2017)” (Line 136) and “according to a previous publication” (Line 194).

      Reference:

      Ma, Z., Liu, H., Mi, Z., Zhang, Z., Wang, Y., Xu, W. et al. (2017). Climate warming reduces the temporal stability of plant community biomass production. Nature Communications, 8, 15378.

      (16) L188, change "pre-wet soil samples" to "pre-wet samples" and change "soil samples for 48h incubation" to "incubation samples". What does "pre-wet" mean? Does it represent soil pre-cultivation?

      Done. The pre-wet samples, i.e., the soil samples before incubation (T = 0 d), were used to estimate the initial microbial composition. "pre-wet" does not mean soil pre-cultivation. We have added the description “A portion of the air-dried soil samples was taken as the pre-wet treatment (i.e., before incubation without H2O addition)” in MATERIALS AND METHODS (Line 174-175).

      (17) Unify the time unit of incubation (hour or day). Consider changing "48 h" to "2 d" in Materials and Methods.

      Done.

      (18) L247, what version of RDP Classifier was used?

      We used RDP v16 database for taxonomic annotation. We have added this information in the revision (Line 246).

      (19) L270, "average molecular weights".

      Done (Line 268).

      (20) L272-275, based on the preceding description, it appears that the culture period was limited to 48 hours. Please confirm it.

      Apologize for this mistake. We have revised it (Line 273).

      (21) L297, switch the order of the first two sentences of this paragraph.

      Done (Line 297).

      (22) L331, change "smaller-than-additive" to "smaller than their expected additive effect".

      Done (Line 331).

      (23) L374 and 381, I struggle with why "larger combined effects" than single factor effects represent higher degree of antoninism, and I think it should be "smaller combined effects".

      Agree. We have revised it according to this suggestion (Line 369 and 374).

      (24) L375, remove "than that of drought and warming".

      Done.

      (25) L405, simplify the expression, change "between different warming and rainfall regimes" to "between climate regimes"

      We have deleted this sentence.

      (26) L406-408, species are already on the phylogenetic tree and they can not "clustered at the phylogenetic branches", but the functional traits of microbes can. Please revise it.

      We have revised this sentence to “Overall, the most incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic clustering (i.e., species clustered at the phylogenetic branches; NTI > 0, P < 0.05)” (Line 402-404).

      (27) L409, the same as above, and consider removing "The incorporators subjected to". We have revised this sentence to “The incorporators whose growth subjected to the additive interaction of warming × drought also showed significant phylogenetic clustering (P < 0.05)” (Line 404-406).

      (28) L412, consider changing "incorporators subjected to the synergistic interaction" to "the synergistic growth responses under multifactorial changes".

      We have revised the sentence to “incorporators whose growth is influenced by the synergistic interaction showed phylogenetically random distribution under both climate scenarios (P > 0.05)” (Line 407-409).

      (29) L505-506, please add a reference for this sentence.

      Done (Line 488).

      (30) L511-514, It should be noted that the production of MBC does not necessarily imply a net change in the C pool size. The accelerated growth rates may result in expedited turnover of MBC, rather than an increase in carbon sequestration.

      Thanks. We have deleted this sentence.

      (31) Language precision. In the discussion section there must be some additional caveats introduced to some of the claims the authors are making. For instance, L518, the author should clarify that "in this study, the bacterial growth in alpine grassland may be influenced by antagonistic interactions between multiple climatic factors after a decadal-long experiment". Because other studies may exhibit different results due to the focus on different ecosystem functions as well as environmental conditions. As such, softening of the language is recommended- lines are noted below- and these will not adjust the outcomes of this study, but support more precise interpretation.

      We have revised the sentence to “In this study, a decade-long experiment revealed that bacterial growth in alpine meadows is primarily influenced by the antagonistic interaction between T × P” (Line 497-499).

      (32) Picrust analysis is a good way to connect species and their functions, especially Picrust2, which updated the reference database and optimized the algorithm to improve its prediction accuracy (Douglas et al., 2020, Nature Biotechnology). However, the link between microbial taxonomy and microbial metabolism is still not straightforward, especially in diverse microbial communities like soils. The authors should introduce caveats within discussion that they know the limitations of their methods. For context, as a reader who does metabolisms in soils, I found myself somewhat disappointed when piecrust data was introduced and not properly caveated. Particularly, it might be helpful to introduce briefly in the last paragraph of the results. These caveats are necessary to not potentially overstate the author's findings, and to make sure the reader knows the authors understand the very clear limitations of these methods. [https://www.nature.com/articles/s41587-020-0548-6]

      Thanks. We have introduced caveats in DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542).

      Reference:

      Douglas, G., Maffei, V., Zaneveld, J., Yurgel, S., Brown, J., Taylor, C. et al. (2020). PICRUSt2 for prediction of metagenome functions. Nature Biotechnology, 38, 1-5.

      (33) Although the author has explained the potential causes for the negative effects of different climate change factors (i.e., warming, drought, and wet) on microbial growth, there seems to be a lack of a summary assertion and an extension on how climate change affects microbial growth and related ecosystem functions. It is recommended to make a general summary of the results in the last part of Discussion.

      We have added a general summary in the last paragraph of DISCUSSION, that is “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction. This suggests the development of multifactor manipulation experiments in precise prediction of future ecosystem services and feedbacks under climate change scenarios” (Line 552-558).

      (34) L546, please add the taxonomic information for "OTU 14".

      Done (Line 533).

      (35) L800, change "The phylogenetic tree" to "A phylogenetic tree".

      Done (Line 762).

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to describe the effect of different temperature and precipitation regimes on microbial growth responses in an alpine grassland ecosystem using quantitative 18O stable isotope probing. It was found that all climate manipulations had negative effects on microbial growth, and that single-factor manipulations exerted larger negative effects as compared to combined-factor manipulations. The degree of antagonism between factors was analyzed in detail, as well as the differential effect of these divergent antagonistic responses on microbial taxa that incorporated the isotope. Finally, a hypothetical functional profiling was performed based on taxonomic affiliations. This work gives additional evidence that altered warming and precipitation regimes negatively impact microbial growth.

      Strengths:

      A long term experiment with a thorough experimental design in apparently field conditions is a plus for this work, making the results potentially generalisable to the alpine grassland ecosystem. Also, the implementation of a qSIP approach to determine microbial growth ensures that only active members of the community are assessed. Finally, particular attention was given to the interaction between factors and a robust approach was implemented to quantify the weight of the combined-factor manipulations on microbial growth.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      The methodology does not mention whether the samples taken for the incubations were rhizosphere soil, bulk soil or a mix between both type of soils. If the samples were taken from rhizosphere soil, I wonder how the plants were affected by the infrared heaters and if the resulting shadow (also in the controls with dummy heaters) had an effect on the plants and the root exudates of the parcels as compared to plants outside the blocks? If the samples were bulk soil, are the results generalisable for a grassland ecosystem? In my opinion, it is needed to add more info on the origin of the soil samples and how these were taken.

      The samples taken for the incubations can be considered as a mixture of rhizosphere and bulk soils. During soil sampling, we did not use conventional rhizosphere soil collection methods. However, there is a certain proportion of fragmented roots in the soil samples we collected, indicating that soil properties are influenced by plants. We have added this description in MATERIALS AND METHODS (Line 158).

      To minimize the impact of physical shading on the plants, each sampling point was as far away from infrared heaters as possible. We have added this information of soil collection in MATERIALS AND METHODS, that is “In each plot, three soil cores of the topsoil (0-5 cm in depth) were randomly collected and combined as a composite sample, which can be considered as a mixture of rhizosphere and bulk soils. Each sampling point was as far away from infrared heaters as possible to minimize the impact of physical shading on the plants. The fresh soil samples were shipped to the laboratory and sieved (2-mm) to remove root fragments and stones.” (Line 157-162).

      Previous studies based on our field experiment assessed the effects of warming and altered precipitation on soil microbial communities (Zhang et al., 2016), the temporal stability of plant community biomass (Ma et al., 2017), shifting plant species composition and grassland primary production (Liu et al., 2018). These studies provide guidance for the experiment design and execution.

      Reference:

      Zhang, KP., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ma ZY., Liu, HY., Mi, ZR. et al. (2017). Climate warming reduces the temporal stabilityof plant community biomass production. Nature Communications, 8, 15378.

      Liu, HY., Mi, ZR., Lin, L. et al. (2018). Shifting plant species composition in response to climate change stabilizes grassland primary production. Proceedings of the National Academy of Sciences, 115, 4051-4056.

      The qSIP calculations reported in the methodology for this work are rather superficial and the reader must be experienced in this technique to understand how the incorporators were identified and their growth quantified. For instance, the GC content of taxa was calculated for reads clustered in OTUs, and it is not discussed in the text the validity of such approach working at genus level.

      We have added the description of qSIP calculations in Supplementary Materials.

      The approach of GC content calculation can be used at genus level (Koch et al., 2018). The GC content of each bacterial taxon (Gi) was calculated using the mean density for the unlabeled (WLIGHTi) treatments (Hungate et al. 2015), rather than OTU sequence information. We have revised the sentence in MATERIALS AND METHODS, that is “the number of 16S rRNA gene copies per OTU taxon (e.g., genus or OTU) in each density fraction was calculated by multiplying the relative abundance (acquisition by sequencing) by the total number of 16S rRNA gene copies (acquisition by qPCR)” (Line 255-258).

      Reference:

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The selection of V4-V5 region over V3-V4 region to quantify the number of copies of the 16S rRNA gene should be substantiated in the text. Classic works determined one decade ago that primer pairs that amplify V3-V4 are most suitable to assess soil bacterial communities. Hungate et al. (2015), worked with the V3-V4 region when establishing the qSIP method. Maybe the number of unassigned OTUs is related with the selection of this region.

      Both primer sets (V3-V4 and V4-V5 regions), are widely used across various sample sets, with highly similar in representing the total microbial community composition (Fadeev et al., 2021; Zhang et al., 2018).

      A previous study based on our Field Research Station of Alpine Grassland Ecosystem used V4-V5 primer pairs to investigated the effect of warming and altered precipitation on the overall bacterial community composition (Zhang et al., 2016).

      Another reason for choosing the V4-V5 primer set in this study was to integrate and compare the data with that of two previous qSIP studies (Ruan et al., 2023; Guo et al., submitted), both of them focused on the growth responses of active species to global change and used V4-V5 primer pairs.

      We have added an explanation about primer selection as “The V4-V5 primer pairs were chosen to facilitate integration and comparison with data from previous studies (Ruan et al., 2023; Zhang et al., 2016)” (Line 213-215).

      Reference:

      Fadeev, E., Cardozo-Mino, M.G., Rapp, J.Z. et al. (2021). Comparison of Two 16S rRNA Primers (V3–V4 and V4–V5) for Studies of Arctic Microbial Communities. Frontiers in Microbiology, 12

      Zhang, J.Y., Ding, X., Guan, R. et al. (2018). Evaluation of different 16S rRNA gene V regions for exploring bacterial diversity in a eutrophic freshwater lake. Science of The Total Environment, 618, 1254-1267.

      Zhang, K.P., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ruan, Y., Kuzyakov, Y., Liu, X. et al. (2023). Elevated temperature and CO2 strongly affect the growth strategies of soil bacteria. Nature Communications, 14, 1-12.

      Guo, J.J., Kuzyakov, Y., Li, L. et al. (2023). Bacterial growth acclimation to long-term nitrogen input in soil. The ISME Journal, Submitted.

      Report of preprocessing and processing of the sequences does not comply state of the art standards. More info on how the sequences were handled is needed, taking into account that a significant part of the manuscript relies on taxonomic classification of such sequences. Also, an OTU approach for an almost species-dependent analysis (GC contents) should be replaced or complemented with an ASV or subOTUs approach, using denoisers such as DADA2 or deblur. Usage of functional prediction tools underestimates gene frequencies, including those related with biogeochemical significance for soil-carbon and nitrogen cycling.

      (1) We have complemented the information about sequence processing as “The raw sequences were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). In brief, the paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to remove redundant sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence.” (Line 238-245).

      (2) We have complemented the zero-radius OTU (ZOTU) analysis by the unoise3 command in USEARCH (https://drive5.com/usearch/manual/pipe_otus.html), as shown in Fig. S1-S2. The results showed that overall growth responses of soil bacteria to warming and precipitation changes were similar based on OTU and ZOTU analyses, i.e., warming and altered precipitation tend to negatively affect the growth of grassland bacteria and the prevalence of antagonistic interactions of T × P. The similarity of results between the different methods is reflected at the overall community level, the phylum level, the genus level and the species (i.e., OTU or ZOTU) level (Fig. S1 and S2).

      Author response image 1.

      The growth responses of grassland bacteria to warming and altered precipitation based on ZOTU analysis. The results of growth rates at the community level (A), the phylum level (B), and the ZOTU level (C and D) were similar to those based on OTU analysis. C the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. D the proportions of species growth influenced by different interaction types of T × P. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Author response image 2.

      The growth responses of grassland bacteria at the genus level to warming and altered precipitation based on OTU analysis (A and C) and ZOTU analysis (B and D). A and B the single and combined factor effects of climate factors on growth in genera, by comparing with those in T0nP. C and D the proportions of genera whose growth influenced by different interaction types of T × P.

      (3) Agreed. We have introduced the caveat about the limitation of usage of functional prediction tools to the end of DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542). The caveat ensures that the reader knows the limitations of these methods, and are not potentially overstate our findings.

      Reference:

      Douglas, G.M., Maffei, V.J., Zaneveld, J.R. et al. (2020) PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 38, 685–688.

      Reviewer #2 (Recommendations For The Authors):

      General suggestions:

      Regarding the qSIP method, would be of help to see the differences in density vs number of 16S rRNA gene abundance for the most responsive bacterial groups in the different treatments, taking into account that with only 7 fractions the entire change in bacterial growth was resolved.

      We have selected three representative bacterial taxa (OTU1 belonging to Bradyrhizobium, OTU14 belonging to Solirubrobacter, OTU15 belonging to Pseudoxanthomonas), which have high growth rates in climate change treatments. The result showed that the peaks in the 18O treatment are shifted "backwards" (greater average weighted buoyancy density) compared to the 16O treatment, indicating that these species assimilates the 18O isotope into the DNA molecules during growth.

      Author response image 3.

      The distribution of 16S rRNA gene abundance of three representative bacterial taxa (OTU1- Bradyrhizobium, OTU14-Solirubrobacter, and OTU15-Pseudoxanthomonas) in different buoyant density fractions. Values represent mean and the error bars represent standard deviation.

      Seven fractionated DNA samples were selected for sequencing because they contained more than 99% gene copy numbers of each samples (please see the Figure below). The DNA concentrations of other fractions were too low to construct sequencing libraries.

      Author response image 4.

      Relative abundance of 16S rRNA gene copies in each fraction. The fractions with density between 1.703 and 1.727 g ml-1 were selected because they contained more than 99% gene copy numbers of each sample. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation.

      With such dataset additional multivariate analysis would be of help to better interpret the ecological framework.

      Thanks for the suggestion. Interpreting the ecological framework is meaningful for understanding microbial responses to environmental changes.

      The main objective of this study is to investigate the growth response of soil microbes in alpine grasslands to the temperature and precipitation changes, and the interaction between climate factors. Our results, as well as the results of complementary analyses (based on subOTU analyses, SHOWN BELOW), indicate that warming and altered precipitation tend to negatively affect the growth of grassland bacteria, and the prevalence of antagonistic interactions of T × P.

      We have emphasized our research objectives and main conclusions in the revised manuscript: “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau” (Line 112-114);

      “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction” (Line 552-556).

      Extension of interaction analysis and its conclusions should be shortened, summarizing the most relevant findings. In my opinion, it becomes a bit redundant.

      We have shortened the discussion of Extension of interaction analysis by deleting the little relevant contents.

      Below are some, but not all, examples that have been deleted or revised in the Discussion,

      (1) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive”;

      (2) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)”;

      (3) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” (Line 499-503);

      (4) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” (Line 464-466).

      I strongly suggest a functional analysis based on shotgun sequencing or RNAseq approaches. With this approach this work would be able to answer who is growing under altered T and Precipitation regimes and what are those that are growing doing.

      Thanks for the suggestion. Metagenomic sequencing is a popular approach to evaluate potential functions of microbial communities in environment. However, there are two main reasons that limit the application of metagenomic or metatranscriptomic sequencing in this study: 1) Most of the fractionated samples in SIP experiment have low DNA concentration and do not meet the requirement of library construction for sequencing; 2) Metagenome and metatranscriptomics usually have relatively low sensitivity to rare species, reducing the diversity of detected active species.

      This study focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      Minor suggestions:

      L121. _As

      We have deleted this sentence and relocated the hypotheses in the last paragraph of INTRODUCTION (according to the suggestion of the reviewer #3).

      Line150. Described previously in.

      Done (Line 136).

      Line500. Check whether it is better to use the word acclimatization (Coordinated response to several simultaneous stressors) in exchange of acclimation

      We have revised it according to this suggestion (Line 481).

      Fig.4C Drought

      Done (Line 761).

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Ruan et al. studied the long-term impact of warming and altered precipitations on the composition and growth of the soil microbial community. The researchers adopted an experimental approach to assess the impact of climate change on microbial diversity and functionality. This study was carried out within a controlled environment, wherein two primary factors were assessed: temperature (in two distinct levels) and humidity (across three different levels). These factors were manipulated in a full factorial design, resulting in a total of six treatments. This experimental setup was maintained for ten years. To analyze the active microbial community, the researchers employed a technique involving the incorporation of radiolabeled water into biomolecules (particularly DNA) through quantitative stable isotope probing. This allowed for the tracking of the active fraction of microbes, accomplished via isopycnic centrifugation, followed by Illumina sequencing of the denser fraction. This study was followed by a series of statistical analysis to identify the impact of these two variables on the whole community and specific taxonomic groups. The full factorial design arrangement enabled the researchers to discern both individual contributions as well as potential interactions among the variables

      Strengths:

      This work presents a timely study that assesses in a controlled fashion the potential impact of global warming and altered precipitations on microbial populations. The experimental setup, experimental approach and data analysis seem to be overall solid. I consider the paper of high interest for the whole community as it provides a baseline to the assessment of global warming on microbial diversity.

      Thanks for the encouragement and positive comments.

      Weaknesses:

      While taxonomic information is interesting, it would have been highly valuable to include transcriptomics data as well. This would allow us to understand what active pathways become enriched under warming and altered precipitations. Non-metabolic OTUs hold significance as well. The authors could have potentially described these non-incorporators and derived hypotheses from the gathered information. The work would have benefited from using more biological replicates of each treatment.

      Thanks for the valuable suggestions.

      (1) Metatranscriptomics can assess the functional profiles of the community, but it has relatively low sensitivity to rare species, which is difficult to correlate the function pathways with the assignment to the numerous active taxa identified by qSIP. Additionally, due to the low DNA concentration, most fractionated samples are difficult to construct sequencing libraries, while amplicon based sequencing analyses were allowed. This study therefore focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      (2) 18O-qSIP can identify the growing microbial species (i.e., 18O incorporators) in the environment rather than metabolically active taxa. These non-incorporators in our study were likely to be metabolically active, i.e., maintaining life activities without reproduction, or recently deceased (Blazewicz et al., 2013). Therefore, it is hard to distinguish whether these non-incorporators possess metabolic activity.

      (3) Agreed. The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis. We have explained this issue in MATERIALS AND METHODS, that is “Considering the cost of qSIP experiment (i.e., the use of isotopes and the sequencing of a large number of DNA samples), we randomly selected three out of the six plots, serving as three replicates for each treatment” (Line 154-157).

      Reference:

      Nuccio, E.E., Starr, E., Karaoz, U. et al. (2020) Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J 14, 999–1014.

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      The manuscript should be written in a clearer way. The language should be more direct, so the message is conveyed faster and clearer. Some sentences, for instance, could be shortened or re-organized. Below, you will find some examples.

      We have rewritten the sentences to make the manuscript clearer. Below are some, but not all, examples that have been revised:

      (1) Deleted “(reduced precipitation, hereafter ‘drought’, or enhanced precipitation, hereafter ‘wet’)” in INTRODUCTION;

      (2) Deleted “Controlled experiments simulating climate change have investigated changes in microbial community composition as measured by shifts in the relative abundances (Evans & Wallenstein, 2014; Barnard et al., 2015). However, changes in relative abundances may be poor indicators of population responses to environmental change in some cases (Blazewicz et al., 2020). Another challenge is the presence of a large number of inactive microbial cells in the soil, which hinders the direct, quantitative measure of the ecological drivers in population dynamics (Fierer, 2017; Lennon & Jones, 2011).” in DISCUSSION;

      (3) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive” in DISCUSSION;

      (4) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)” in DISCUSSION;

      (5) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” in DISCUSSION (Line 499-503);

      (6) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” in DISCUSSION (Line 464-466).

      I'm curious about why, even though there were six replicates of the experiment, only three samples were collected for analysis. Metagenomic analyses tend to display high variability.

      The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis..

      In Fig. 3A, the absolute growth rates (16S copies/d*g) are shown. How do you know that the efficiency of DNA extraction was similar across all treatments and therefore the absolute numbers are comparable?

      To avoid differences in extraction efficiency caused by experimental procedures, all DNA samples were extracted by the same person (the first author) within 2-3 hours, and a unifying procedure of cell lysis and DNA extraction was used, i.e., the mechanical cell destruction was attained by multi-size beads beating at 6 m s-1 for 40 s, and then FastDNA™ SPIN Kit for Soil (MP Biomedicals, Cleveland, OH, USA) was used for DNA extraction.

      We have measured the concentration of extracted DNA and found no significant difference between treatments (Table for the response letter).

      Author response table 1.

      Soil DNA concentration in climate change treatments after qSIP incubation (measured by Qubit® DNA HS Assay Kits).

      Values represent mean and standard deviation. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. The results of ANOVA indicated no significant difference of extracted DNA concentration between treatments (p > 0.05).

      We have introduced the caveat in the DISCUSSION, that is “Note that the experimental parameters such as DNA extraction and PCR amplification efficiencies also have significant effects on the accuracy of growth assessment. This alerts the need to standardize experimental practices to ensure more realistic and reliable results” (Line 544-547).

      Line 96-99 and 121-124: "Hypotheses are typically placed at the end of the final paragraph in the Introduction section. It is advisable to relocate them there and provide a clearer description of the paper's main goal."

      We have relocated the hypotheses at the end of INTRODUCTION, and the main goal of this study, that is “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau, by using the 18O-quantitative stable isotope probing (18O-qSIP)” (Line 112-115).

      Line 399: Although you describe the classification among antagonistic interactions in the Methods section, I think you should describe this in further detail here. Can you clarify how you carried out this categorization and how these results were interpreted considering the phylogenetic classification.

      We have added the description of antagonistic interactions, that is “The interaction type of T × P on the growth of ~70% incorporators was antagonistic (i.e., the combined effect size is smaller than the additive expectation) (Fig. 4C)” (Line 388-390).

      The interaction types between factors can be classified into three categories: additive, synergistic and antagonistic. Additive interactions are those in which the combined effect size of factors is equal to the sum of the single effects of the factors (i.e., additive expectation, Fig. 1B). Synergistic interactions refer to the effect size was larger than the additive expectation by the combined manipulation of factors. On the contrary, antagonistic interactions refer to the combined effect size of factors is smaller than the additive expectation. In this study, the antagonistic interactions were further divided into three sub-categories: weak antagonistic interaction, strong antagonistic interaction, and neutralizing effect (Fig. 1B). The weak antagonistic interaction refers to the combined effect size smaller than the additive expectation and larger than any of the single factor effects. The strong antagonistic interaction refers to that the combined effect size is smaller than any of the single factor effects but larger than 0. The neutralizing effect refers to that the combined effect size is equal to 0, implying that the effects of different factors cancel each other out.

      Methodologically, the single and combined effects of two climate factors and their interaction effects were calculated by the natural logarithm of response ratio (lnRR) and Hedges’ d, respectively (Yue et al., 2017).

      We have added the result interpretation about the phylogenetic distribution patterns of incorporators, that is “The degree of phylogenetic relatedness can indicate the processes that influenced community assembly, like the extent a community is shaped by environmental filtering (clustered by phylogeny) or competitive interactions (life strategy is phylogenetically random distribution) (Evans & Wallenstein, 2014; Webb et al., 2002).The results showed that the incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic relatedness, indicating the occurrence of taxa more likely shaped by environment filtering (i.e., selection pressure caused by changes in temperature and moisture conditions). In contrast, the growing taxa affected by synergistic interactions of T × P showed random phylogenetic distributions (Table S1), which may be explained by competition between taxa with similar eco-physiological traits or changes in genotypes (possibly through horizontal gene transfer) (Evans & Wallenstein, 2014). We also found that the extent of phylogenetic relatedness to which taxa groups of T × P interaction types varied by climate scenarios, suggesting that different climate history processes influenced the ways bacteria survive temperature and moisture stress” (Line 515-529).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Yue, K., Fornara, D.A., Yang, W., Peng, Y., Peng, C., Liu, Z. et al. (2017). Influence of multiple global change drivers on terrestrial carbon storage: additive effects are common. Ecology Letters, 20, 663-672.

      Line 407-8: What do you mean with "...clustered at the phylogenetic branches" and Line 410: "cluster near the tips of the phylogenetic tree". Can you please clarify?

      Sorry for the unclear statement. We have added the explanation of NTI, that is “Nearest taxon index (NTI) was used to determine whether the species in a particular growth response are more phylogenetically related to one another than to other species (i.e., close or clustering on phylogenetic tree). NTI is an indicator of the extent of terminal clustering, or clustering near the tips of the tree (Evans & Wallenstein, 2014; Webb et al., 2002)” (Line 397-401).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Could you provide some info about the biochemistry of the incorporation of heavy water into DNA molecules? What specific enzymes are typically involved?

      Due to the low DNA concentration in most fractionated samples (less than 10 ng/μL, measured by Qubit DNA HS Assay Kits), only amplicon based sequencing analyses were allowed. This study therefore focused only on active microbial taxa and their growth in response to multifactorial climate change.

      What might be the impact of soil desiccation on bacterial survival and subsequent water uptake?

      Slow dehydration and air drying of soil is a very common phenomenon in nature (Koch et al., 2018). In this process, microorganisms will reduce metabolism, and shift towards a potentially active state (Blagodatskaya and Kuzyakov, 2013). A previous study suggested that the potentially active microbial population permanently existing in soil between the active and dormant physiological states. Even under long-term starvation the potentially active microorganisms maintain ‘physiological alertness’ to be ready to occasional substrate input (Blagodatskaya and Kuzyakov, 2013). These microorganisms are important participants in the biogeochemical cycle is the focus of this study.

      Replacing the environmental water in the soil with 18O-labelled water is a typical practice for qSIP studies (Hungate et al. 2015; Koch et al., 2018). This process may cause disturbance to the microbial community. In this study, the soil samples were placed in a thermostatic incubator (14℃ and 16℃), rather than air-drying at 25℃ (as used in most studies). The incubation temperature is relatively low (compared to 25℃) and there is no violent air convection in the incubator, resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h. The process of soil drying in this study simulated the natural phenomenon, i.e., slow water loss in soil.

      We have added the description in MATERIALS AND METHODS, that is “There is no violent air convection in the incubator and the incubation temperature is relatively low (compared to 25℃ used in previous studies), resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h” (Line 171-174).

      Reference:

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The analysis of the 180 incorporators is interesting as it defines what microbes are metabolically active and hence growing under the different conditions tested. Should not be worth to analyze the non-incorporators? Is it possible to identify a pattern to generate a hypothesis of why they are metabolically inactive based on this information? In the Methods section, the authors state that they identified a total of 6,938 OTUs, of which only 1,373 were found to be incorporators.

      Microbes exist in a range of metabolic states: growing, active (non-growth), dormant and recently deceased (Blazewicz et al., 2013), and there is still a lack of clear threshold for their identification. 18O-DNA qSIP can identified the growing microbial species (i.e., 18O incorporators) rather than all metabolic active taxa, because some cells are measurably metabolizing (catabolic and/or anabolic processes) without reproduction. Therefore, the non-incorporators in our study may be metabolically active, or not (recently deceased microorganisms). This study focuses on the growing microorganisms identified by 18O-qSIP.

      In this study, ~20% microbial taxa (1,373/6,938) were identified as 18O incorporators. Microorganisms in soils suffer from resource and energy constraints frequently (Blagodatskaya and Kuzyakov, 2013). The energy requirements of species in the growing state are much higher (~30 fold) than those in the non-growing state, so the percentage of growing bacterial taxa in soil tends to be low.

      Reference:

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Minor comments:

      Fig. 3A and 3B. Please show the results of the multiple comparisons.

      Done.

      Author response image 5.

      Bacterial growth responses to climate change and the interaction types between warming and altered precipitation. The growth rates (A), and responses (LnRR) of soil bacteria to warming and altered precipitation (B) at the whole community level. The growth rates (C), and responses of the dominant bacterial phyla (D) had similar trends with that of the whole community. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Fig. 4. This figure should be self-explanatory. This diagram is challenging to understand.

      We have revised Fig. 4 to improve clarity.

      Author response image 6.

      The growth responses and phylogenetic relationship of incorporators subjected to different interaction types under two climate scenarios. A phylogenetic tree of all incorporators observed in the grassland soils (A). The inner heatmap represents the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. The outer heatmap represents the interaction types between warming and altered precipitation under two climate change scenarios. The proportions of positive or negative responses in species growth to single and combined manipulation of climate factors by summarizing the data from the inner heatmap (B). The proportions of species growth influenced by different interaction types of T × P by summarizing the data from the outer heatmap (C).

      Fig. 4. It says "Dorought" instead of "drought"

      Done (Line 760).

      Line 109: "relieves" instead of "relieved"

      Done (Line 102).

      Line 129: Should be: "We classified the interaction types as additive, synergistic, antagonistic, null and neutralizing."

      Done (Line 117).

      Line 233: How were the 16S rRNA sequences from each density fraction analyzed?

      (1) Raw sequencing data processing:

      The raw 16S rRNA gene sequences of each density fraction were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). The paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to identify the unique sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence. The taxonomic affiliation of the representative sequence was determined using the RDP classifier (Wang et al., 2007).

      (2) qSIP calculation:

      Sequencing data reflects the relative abundance of taxa in community. We multiply the OTU’s relative abundance (acquisition by sequencing) and the number of 16S rRNA gene copies (acquisition by qPCR) to obtain the number of gene copies per OTU in each fraction. Then, the proportion of gene copies of a specific OTU of each fraction relative to the total amount of gene copies in one sample was calculated and used as a weight value for further calculation of the average weighted buoyant density (the critical parameter for assessing microbial growth).

      Line 366: "Three single-factor ... between warming and altered precipitation" -> "The individual impact of warming, drought, and wet conditions resulted in the most substantial negative effects on bacterial growth compared with the effects of warming x drought and warming x wet. A result that illustrates the negative interactions between warming and modified precipitations patterns."

      Done (Line 365-368).

      Line 376: "Similar with the result of whole growth of bacteria community, the growth responses of the major bacterial phyla were also negatively influenced by single climate factors". This sentence is hard to read. Maybe something like this: "Growth of the major bacterial phyla was also negatively influenced by the individual climate factors".

      Done (Line 371-372).

      Line 383: "In particular, the effects of wet and warming neutralized each other, resulting the net effects became zero on the growth rates of the phyla Actinobacteria and Bacteroidetes". "In Actinobacteria and Bacteroidetes, the effect of wet and warming neutralized each other, as the combined effect of these two factors had no effect on growth".

      Done (Line 377-379).

      Line 390: "The individual warming treatment (T+nP) reduced the growth rates of 75% incorporators..." "Warming (T+nP) reduced the growth of 75% of the taxonomic groups, which was followed by drought and wet.

      Done (Line 384-385).

      Line 392: "The combined manipulations of warming and altered precipitation lowered the percentages of incorporators with negative responses compared with single factor manipulation, especially warming and enhanced precipitation manipulation" -> "Warming x drought and warming x wet had a smaller impact on the growth of incorporators, compared with single effects."

      Done (Line 385-387).

      Line 468. This sentence "To the best ..." is not necessary.

      We have deleted this sentence.

      Line 476. Is it really "synthesis" the word you want to use?

      We have deleted this sentence.

      Line 477. Maybe should written like this: "Consistent with our findings, a recent experimental study demonstrated that 15 years of warming reduced the growth rate of soil bacteria in a montane meadow in northern Arizona."

      Done (Line 459-461).

      Line 490 and 502. Consider using "however" only once in a paragraph.

      We have deleted the second “however” (Line 483).

      Line 555-559. Based on genomic data you cannot predict the functional role of microbes in the environment. These sentences are speculative. Please, consider using less strong affirmations and focus more on the pathways that are enriched in the incorporators.

      Agreed. We have deleted this part of content.

  3. Mar 2024
    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Chen et al. reported that the core binding factor beta (Cbfβ), a heterodimeric subunit of the RUNX family transcription factors (TFs), is crucial in maintaining cartilage homeostasis and counteracting traumatic OA pathology. Using mouse models in which Cbfβ is conditionally inactivated in the Col2a1+ and Acan+ cells, the authors claimed that Cbfβ ablation led to articular cartilage (AC) degeneration, which is associated with aberrant cartilage gene expression and chondrocyte signaling, particularly the elevated Wnt/Catenin and the decreased Hippo/YAP and TGFβ signaling. The authors further showed that Cbfβ transcripts are decreased in human OA cartilage, and sustaining Cbfβ expression in mouse knee joints mitigated the severity of surgery-evoked OA.

      On the whole, the work reported is interesting and exciting. Genetic and biochemical data support key statements. Both in vivo and in vitro experiments were well designed with proper controls; semiquantitative data were digitalized and processed for statistical significance. Furthermore, new findings were adequately discussed in contrast to the current available knowledge. However, the conceptual novelty of this study is slightly compromised by recent publications showing that Cbfβ reduction is associated with OA (Che et al. 2023; Li et al. 2021). Also, the authors claimed that multiple signaling pathways were affected by Cbfβ ablation in cartilage cells; many of them, however, are indirect effects given the nature of Cbfβ as a TF. The authors also showed that pSMAD2/3 and active βCatenin decreased and increased upon Cbfβ depletion in the mouse AC cartilage. However, how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further discussion. Overall, Cbfβ's role in cartilage and OA pathology is an emerging area of study; the authors provided a set of genetic evidences showing that Cbfβ is indispensable for cartilage homeostasis.

      We thank the reviewer for the positive appraisal of our manuscript. We greatly appreciate the insightful comments and critiques. In accordance with the reviewer’s suggestions, we have thoroughly revised all parts of the manuscript. We are glad that the reviewers considered our work to be of interest, and we are grateful for this opportunity to resubmit our manuscript. With regard to concerns of novelty of our study, Li et al’s study only reported the relationship between abnormal Cbfβ expression in human cartilage and osteoarthritis. Che et al’s study employed Cbfβf/fAggrecan-cre mice, while our study used a novel inducible Cbfβf/fCol2α1CreERT mouse model. While the Aggrecan-creERT system provides valuable insights into the role of Cbfβ in differentiated cartilage cells and its implications in the advanced stages of osteoarthritis, our current study also used Cbfβf/fCol2α1-CreERT aimed to explore the gene's function from a broader perspective. Previous study points out that Col2α1 is expressed in both early and late stage of chondrogenesis, including skeletal mesenchymal cells, perichondrium and presumptive joint cells, but aggrecan is expressed specifically in differentiated chondrocytes(1). However, studies show that not only differentiated chondrocytes but also chondrocyte progenitors are involved in OA pathogenesis(2). In our current study, the Col2α1-CreERT system allowed us to investigate Cbfβ's role not only in mature chondrocytes but also in early chondroprogenitor cells, offering a comprehensive view of Cbfβ’s involvement in cartilage in osteoarthritis. Therefore, the use of the Cbfβf/fCol2α1-CreERT mouse mutant strain was instrumental in expanding our understanding of Cbfβ's multifaceted role in osteoarthritis, highlighting its importance not only in mature cartilage but also in the early stages of cartilage formation and differentiation. In addition to the different types of Cre used compared to our previous study, our current study also used gain-of-function approach in ACLT-induced OA disease model to understand the potential therapeutic function of Cbfβ in OA pathological condition. Adding our current findings to our previous research, we can now piece together a more complete picture of Cbfβ's role across the entire spectrum of cartilage development in osteoarthritis.

      We agree with the reviewer that how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further exploration. So far there is no clear explanation of this, which is why we used RNA-seq and heatmap analysis to examine other genes expression which could help to uncover the mechanism underlying these results. Interestingly, Che et al’s result showed that TGFB signaling (P-Smad3) increased in Cbfβf/fAggrecan-cre mice, while our data showed that TGFB signaling (both PSmad3 and Smad3) decreased in Cbfβf/fCol2α1-CreERT mice as shown in our results in Figure 8. These results were also confirmed by RNA-seq analysis as shown in the heatmaps in figure 5.

      These differences could be the result of different mouse ages used in our study and Che et al’s study.

      1. Blaney Davidson EN, van de Loo FA, van den Berg WB, van der Kraan PM. How to build an inducible cartilagespecific transgenic mouse. Arthritis Res Ther. 2014;16(3):210.

      2. Tong L, Yu H, Huang X, Shen J, Xiao G, Chen L, et al. Current understanding of osteoarthritis pathogenesis and relevant new approaches. Bone Res. 2022;10(1):60.

      Reviewer #3 (Public Review):

      The authors comprehensively demonstrated the Cbfβ gene, which is involved in articular cartilage homeostasis, can promote articular cartilage regeneration and repair in osteoarthritis (OA) through regulating Hippo/YAP signaling TGF-β signaling, and canonical Wnt signaling. First, the authors demonstrated the deletion of Cbfβ can induce the OA phenotypes including decreased articular cartilage and osteoblasts, and increased osteoclasts and subchondral bone hyperplasia, and induce the early onset of OA. Additionally, the authors showed that the deficiency of Cbfβ in cartilage can increase canonical Wnt signaling and decrease TGF-β and Hippo signaling. Finally, the authors demonstrated that the overexpression of Cbfβ can inhibit Wnt signaling and enhance Hippo/YAP signaling in knee joints articular cartilage of ACLT-induced OA mice and protect against ACLT-induced OA. The manuscript is overall well-constructed, and the authors provided evidence to support their findings.

      In Fig. 7I, it could be better to show the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      We thank the reviewer for bringing this to our attention. In the revised figure 7I, we have included the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      In Fig. 9H-K, in the quantification analysis, the OARSI score in the DMM+AAV-YFP group is higher than in the sham group significantly. However, the SO staining results appear to show no significant difference between the DMM+AAV-luc-YFP group (Fig. 9I) and the sham group (Fig. 9H).

      We thank the reviewer for bringing this to our attention. Although both the sham and DMM+AAV-luc-YFP group stain positive for SO, the SO stain intensity of the DMM+AAV-lucYFP group is noticeably lower. In addition, SO staining is not the only parameter which is included in the OARSI score. We also evaluated the cartilage thickness, proteoglycan structure, and Cartilage surface fibrillation index. Our evaluation to determine the OARSI score relies on the qualities of the whole joint, not only the magnified portion. For convenience we have also outlined the region of positive SO stain in the revised figure 9I

    1. Author Response

      eLife assessment

      This important study provides a new, apparently high-performance algorithm for B cell clonal family inference. The new algorithm is highly innovative and based on a rigorous probabilistic analysis of the relevant biological processes and their imprint on the resulting sequences, however, the strength of evidence regarding the algorithm's performance is incomplete, due to (1) a lack of clarity regarding how different data sets were used for different steps during algorithm development and validation, resulting in concerns of circularity, (2) a lack of detail regarding the settings for competitor programs during benchmarking, and (3) method development, data simulation for method validation, and empirical analyses all based on the B cell repertoire of a single subject. With clarity around these issues and application to a more diverse set of real samples, this paper could be fundamental to immunologists and important to any researcher or clinician utilizing B cell receptor repertoires in their field (e.g., cancer immunology).

      We apologize for the long delay in implementing the suggested changes. Some of the co-authors had some personal issues that made it hard to efficiently work on the revision.

      We have addressed all the essential points below, as well as all the detailed comments of each reviewer in the following pages.

      Due to the journal’s guidelines we have to upload an “all black” version of the manuscript as the main version. We have uploaded a revised manuscript with the changes marked in red as a “Related Manuscript file”, which appears at the very end of the Merged Manuscript File, after all the Figures, and at the end of the list of files on the webpage. We apologize for this inconvenience.

      In addition, we have added an extension of HILARy to deal with paired-chain repertoires, and have benchmarked the new method on a recently published synthetic dataset. This new analysis is now presented in new Fig. 5.

      Reviewer #1 (Public Review):

      Identifying individual BCR/Ab chain sequences that are members of the same clone is a longstanding problem in the analysis of BCR/Ab repertoire sequencing data. The authors propose a new method designed to be scalable for application to huge repertoire data sets without sacrificing accuracy. Their approach utilizes Hamming Distance between CDR3 sequences followed by clustering for a fast, high-precision approach to classifying pairs of sequences as related or not, and then refines the classification using mutation information from germline-encoded regions. They compare their method with other state-of-the-art methods using synthetic data.

      The authors address an important problem in an interesting, innovative, and rigorous way, using probabilistic representations of CDR3 differences, frequencies of shared and not-shared mutations, and the relationships between the two under hypotheses of related pairs and unrelated pairs, and from these develop an approach for determining thresholds for classification and lineage assignment. Benchmarking shows that the proposed method, the complete method including both steps, outperforms other methods.

      Strengths of the method include its theoretical underpinnings which are consistent with an immunologist's intuition about how related and unrelated sequences would compare with each other in terms of the metrics to use and how those metrics are related to each other.

      I have two high-level concerns:

      (1) It isn't clear how the real and synthetic data are being used to estimate parameters for the classifier and evaluate the classifier to avoid circularity. It seems like the approach is used to assign lineages in the data from [1], and then properties of this set of lineages are used to estimate parameters that are then used to refine the approach and generate synthetic data that is used to evaluate the approach. This may not be a problem with the approach but rather with its presentation, but it isn't entirely clear what data is being used and where for what purpose. An understanding of this is necessary in order to truly evaluate the method and results.

      The reviewer is correct in their understanding of the pipeline. It should be stressed that the lineages used to guide the generation of the synthetic data was done on VJl classes for which the clustering was easy and reliable, and should therefore be largely model independent.

      We have added an explanation in the main text of why the re-use of real data lineages inferred by HILARy doesn’t bias the procedure, since it’s done on a subset of lineages within VJl classes that are easy to infer (section “Test on synthetic dataset”).

      (2) Regarding the data used for benchmarking - given the intertwined fashion by which the classification approach and synthetic data generation approach appear to have been developed, it is not surprising that the proposed approach outperforms the other methods when evaluated on the synthetic data presented here. It would be better to include in the benchmark the data used by the other methods to benchmark themselves or also generate synthetic data using their data generation procedures.

      We agree with the reviewer that a test of the method on an independent synthetic dataset is important for its applicability and to compare to other methods.

      We have added a new synthetic dataset from the group that designed the partis method to our benchmark. Our method still performs competitively, on par with partis—which was developed and tested on that dataset—and better than other methods. The results are presented in revised Fig. 4 (panels E-G), and Figure 4–figure supplement 1 as a function of the mutation rate.

      In addition, we have used that dataset to benchmark a new version of HILARy that also uses the light chain. We present the results in new Figures 5 and Figure 4–figure supplement 1.

      An improved method for BCR/Ab sequence lineage assignment would be a methodologic advancement that would enable more rigorous analyses of BCR/Ab repertoires across many fields, including infectious disease, cancer, autoimmune disease, etc., and in turn, enable advancement in our understanding of humoral immune responses. The methods would have utility to a broad community of researchers.

      Reviewer #2 (Public Review):

      This manuscript describes a new algorithm for clonal family inference based on V and J gene identity, sequence divergence in the CDR3 region, and shared mutations outside the CDR3. Specifically, the algorithm starts by grouping sequences that have the same V and J genes and the same CDR3 length. It then performs single-linkage clustering on these groups based on CDR3 Hamming distance, then further refines these groups based on shared mutations.

      Although there are a number of algorithms that use a similar overall strategy, a couple of aspects make this work unique. First, a persistent challenge for algorithms such as this one is how to set a cutoff for single-linkage clustering: if it is too low, then one separates clusters that should be together, and if too high one joins together clusters that should be separate. Here the authors leverage a rich collection of probabilistic tools to make an optimal choice. Specifically, they model the probability distributions of within- and between-cluster CDR3 Hamming distances, with parameters depending on CDR3 length and the "prevalence" of clonal sequence pairs (i.e. family size distribution). This allows the algorithm to make optimal choices for separating clusters, given the particular chosen distance metric, and assuming the sample in question has been accurately modeled. Second, the algorithm uses a highly efficient means of doing single-linkage clustering on nucleotide sequences.

      This leads to a fast and highly performant algorithm on data meant to replicate the original sample used in algorithm design. The ideas are new and beautifully developed. The application to real data is interesting, especially the point about dN/dS.

      However, the paper leaves open the question of how this inference algorithm works on samples other than the one used for simulation and as a template for validation. If I understand the simulation procedure correctly - that one takes a collection of inferred trees from the real data, then re-draws the root sequence and the identity of the mutations on the branches - then the simulated data should be very close to the data used to develop the methods in the paper. This consideration seems especially important given that key methods in this paper use mutation counts and overall mutation counts are preserved.

      Repertoires come in all shapes and sizes: infants to adults, healthy to cancerous, and naive to memory to plasma-cell-just-after-vaccination. If this is being proposed as a general-purpose clonal inference algorithm rather than one just for this sample, then a more diverse set of validations are needed.

      We agree that testing the method on a differently generated dataset is a useful check. We should point out, however, that our synthetic dataset is not as biased as it may seem. In particular, it is based on trees from VJl classes that we predicted are very easy to cluster, which means that they are truly faithful to the data, and not dependent on the particular algorithm used to infer them. The big advantage over this synthetic dataset over others is that it recapitulates the power law statistics of clone size distribution, as well as the diversity of mutation rates. To us, it still represents a more useful benchmark than synthetic datasets generated by population genetics models, which miss most of this very broad variability.

      However, to check how the method generalizes to other datasets, we repeated our validation procedure on the dataset used to evaluate Partis in Ralph et al 2022. The new results are discussed in the main text and in new panels of Fig. 4 in the same form as the previous comparisons. We also added a comparison of performance as a function of mutation rate in the new Figure 4–figure supplement 1.

      It is unclear how to run the code. The software repo has a nice readme explaining the file layout, dependencies, and input file format, but the repo seems to be lacking an inference.ipynb mentioned there which runs an analysis. Perhaps this is a typo and refers to inference.py, which in addition to the documented cdr3 clustering, seems to have functions to run both clustering methods. However, it does not seem to have any documentation or help messages about how to run these functions.

      We have completely overhauled the github to provide a detailed step by step explanation of how to run the code. The code is now easily installable using pip.

      The results are not currently reproducible, because the simulated data is not available. The data availability statement says that no data have been generated for this manuscript, however simulated data has been generated, and that is a key aspect of the analysis in the paper.

      We have uploaded the simulated data to zenodo, as well as provided scripts in the github to run the benchmarks.

      More detail is needed to understand the timing comparisons. The new software is clearly written to use many threads. Were the other software packages run using multiple threads? What type of machine was used for the benchmarks?

      All timing comparisons were made based on a single VJl class on a 14 double-threaded CPU computer. HILARy uses all 28 threads, and other methods were run with default settings, with multi-threading allowed.

      We have clarified the specifications of the computer.

      Reviewer #3 (Public Review):

      B cell receptors are produced through a combination of random V(D)J recombination and somatic hypermutation. Identifying clonal lineages - cells that descend from a common V(D)J rearrangement - is an important part of B cell repertoire analysis. Here, the authors developed a new method to identify clonal lineages from BCR data. This method builds off of prior advances in the field and uses both an adaptive clonal distance threshold and shared somatic hypermutation information to group B cells into clonal lineages.

      The major strength of this paper is its thorough quantitative treatment of the subject and integration of multiple improvements into the clonal clustering process. By their simulation results, the method is both highly efficient and accurate.

      The only notable weakness we identified is that much of the impact of the method will depend on its superiority to existing approaches, and this is not convincingly demonstrated by Fig. 4. In particular, little detail is given on how the other clonal clustering programs were run, and this can significantly impact their performance. More specifically:

      We have added a new benchmark to address these concerns, presented in Fig. 4 and in new figure 4 – figure supplement 1 as a function of a controllable mutation rate.

      (1) Scoper supports multiple methods for clonal clustering, including both adaptive CDR3 distance thresholds (Nouri and Kleinstein, 2018) and shared V-gene mutations (Nouri and Kleinstein, 2020). It is not clear which method was used for benchmarking. The specific functions and settings used should have been detailed and justified. Spectral clustering with shared V gene mutations would be the most comparable to the authors' method. Similar detail is needed for partis.

      In the updated version I use the 2020 version. The 2018 is very similar to simple single linkage so will be removed from the benchmark.

      (2) It is not clear how the adaptive thresholds and shared mutation analysis in the authors' method differ from prior approaches such as scoper and partis.

      We have changed the paragraph in the discussion section about the benchmark to highlight the innovative aspects and differences with previous approaches.

      (3) The scripts for performing benchmarking analyses, as well as the version numbers of programs tested, are not available.

      We have added to the github all the scripts used for benchmarking. We have added details about the version numbers in the data and code availability section of the methods.

      (4) Similar to above, P. 10 describes single linkage hierarchical clustering with a fixed threshold as a "crude method" that "suffers from inaccuracy as it loses precision in the case of highlymutated sequences and junctions of short length." As far as we could tell, this statement is not backed up by either citations or analyses in the paper. It should not be difficult for the authors to test this though using their simulations, as this method is also implemented in scoper.

      We have added this method to our benchmark to support that point. The results are presented in Figure 4 – figure supplement 2.

      References

      Nouri N, Kleinstein SH. 2020. Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data. PLOS Comput Biol 16:e1007977. doi:10.1371/journal.pcbi.1007977

      Nouri N, Kleinstein SH. 2018. A spectral clustering-based method for identifying clones from high- throughput B cell repertoire sequencing data. Bioinformatics 34:i341-i349. doi:10.1093/bioinformatics/bty235

      We have changed citation [22] to refer to the 2018 paper. The 2020 paper is citation [18].

    1. Author Response

      We acknowledge the editors and reviewers for their careful and thoughtful review of the preprint. Their comments and suggestions will be very useful in improving the manuscript's revised version, which we plan to submit in the coming weeks.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewing Editor

      We thank you for clarifying several of the questions raised by the reviewers. Since the study has otherwise largely stayed unchanged, we will leave the eLife assessment as “before”:

      We respectfully disagree because we addressed all concerns raised by the two reviewers except one (below), which was not satisfactorily answered according to reviewer 1; it has now been addressed (new S3 Fig).

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed most of my previous comments. However, there is one important point that was not satisfactorily addressed "The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided" The response that "It is not straightforward to quantify and describe the intensity of the bands of these numerous with different fate outcomes." In the revision, they mentioned at least three repeats were performed. If so, it's not entirely clear why they couldn't quantify the western blots results. Including quantitative data will strengthen the rigor of the findings.

      Quantitative data from Fig. 4 and Fig. 5 are now provided as S3 Fig and described in the manuscript (lines 170-175; 184-188).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Roget et al. build on their previous work developing a simple theoretical model to examine whether ageing can be under natural selection, challenging the mainstream view that ageing is merely a byproduct of other biological and evolutionary processes. The authors propose an agent-based model to evaluate the adaptive dynamics of a haploid asexual population with two independent traits: fertility timespan and mortality onset. Through computational simulations, their model demonstrates that ageing can give populations an evolutionary advantage. Notably, this observation arises from the model without invoking any explicit energy tradeoffs, commonly used to explain this relationship.

      The model’s results are based on both numerical simulations and formal mathematical analysis.

      Additionally, the theoretical model developed here indicates that mortality onset is generally selected to start before the loss of fertility, irrespective of the initial values in the population. The selected relationship between the fertility timespan and mortality onset depends on the strength of fertility and mortality effects, with larger effects resulting in the loss of fertility and mortality onset being closer together. By allowing for a trans-generational effect on ageing in the model, the authors show that this can be advantageous as well, lowering the risk of collapse in the population despite an apparent fitness disadvantage in individuals. Upon closer examination, the authors reveal that this unexpected outcome is a consequence of the trans-generational effect on ageing increasing the evolvability of the population (i.e., allowing a more effective exploration of the parameter landscape), reaching the optimum state faster.

      The simplicity of the proposed theoretical model represents both the major strength and weakness of this work. On one hand, with an original and rigorous methodology, the logic of their conclusions can be easily grasped and generalised, yielding surprising results. Using just a handful of parameters and relying on direct competition simulations, the model qualitatively recapitulates the negative correlation between lifespan and fertility without requiring energy tradeoffs. This alone makes this work an important milestone for the rapidly growing field of adaptive dynamics, opening many new avenues of research, both theoretically and empirically.

      We thank the reviewers and editor for highlighting the importance of the work presented here.

      On the other hand, the simplicity of the model also makes its relationship with living organisms difficult to gauge, leaving open questions about how much the model represents the reality of actual evolution in a natural context.

      We presented both in results and discussion how the mathematical trade-offs between fertility and survival time give rise to (xb, xd) configuration representative of existing aging modes.

      In particular, a more explicit discussion of how the specifics of the model can impact the results and their interpretation is needed. For example, the lack of mechanistic details on the trans-generational effect on ageing makes the results difficult to interpret.

      We discussed the role of the transgenerational Lansing effect played to its function, there is no need for a particular mechanism beyond that function of transgenerational negative effect. We reinforce this in the discussion by adding the following sentence “Regarding the nature of the transgenerational effect, our model is agnostic and the mere transmission of any negative effect would be sufficient to exert the function.“

      Even if analytical results are obtained, most of the observations appear derived from simulations as they are currently presented. Also, the choice of parameters for the simulations shown in the paper and how they relate to our biological knowledge are not fully addressed by the authors.

      The long time limit of the system with and without the Lansing effect is based on analytical results later confirmed using numerical simulations. The choice of parameters is explained in the introduction as being the minimum ones for defining a living organism. As for the parameters’ values, our numerical analysis gives a solution for any ib, id, xb and xd on R+, making the choice of initial value a mere random decision.

      Finally, the conclusions of evolvability are insufficiently supported, as the authors do not show if the wider genotypic variability in populations with the ageing trans-generational effect is, in fact, selected.

      We do not show nor claim that evolvability per se is selected for but that the apparent advantage given by this transgenerational effect seems to be mediated by an increased genotypic/phenotypic variability conferred to the lineage that we interpreted as evolvability.

      Recommendations for the authors

      (1) The authors could use the lineage tracing results for the evolvability aspect. Specifically, within subpopulations featuring the Lansing effect, it would be valuable to explore whether individuals with parental age greater than the mortality onset (a > x_d) demonstrate higher fitness compared to individuals with a < x_d. Additionally, an examination of how this variation evolves over time could provide further insights into the dynamics of the proposed model.

      We thank the reviewer for this suggestion. This is an ongoing work in the group, especially in the context of varying environmental conditions.

      (2) In all simulations, I_b = I_d = 1, resulting in total fertility (x_b * I_b) equating to x_b, while x_d is proportional to life expectancy. Considering an exploration of the implications of this parameter setting, the authors could frame x_d as a 'lifespan cost', potentially allowing for the model to be conceptualised in terms of energetic tradeoffs. This might offer additional perspectives on the dynamics of the model and its alignment with biological principles.

      We discuss how the apparent trade-offs given by the model depending on ib and id values can be related to the interpretation of such trade-offs that has been accepted for most of the past century. Our claim here in the discussion is that one does not need such energetic trade-off for the fertility/longevity trade-offs to appear. Such energetic trade-off is not a “biological principle” but merely an accepted interpretation of a fertility/longevity trade-off that is not even a general mechanism.

      (3) Considering the necessity of variation in x_d for the observed patterns, an exploration could be undertaken by the authors to examine a model where x_d is simply variable without inheritance. This could involve centring x_d at some value d with some variance σ_d for all individuals. In such a scenario, it may be observed whether the same convergence of x_b - x_d occurs without requiring x_d to be selected. Furthermore, similar consequences of the Lansing effect could potentially be identified.

      This was done early on during our work and did not show any major changes in the model’s behaviour beyond the time of convergence. We did not include it to the final manuscript because of the low added value to an already long and complex manuscript.

      (4) While it may not be necessary to alter the model itself, it is suggested that the authors consider acknowledging the potential consequences of certain modelling decisions that might be perceived as biologically unrealistic. Notable examples include assumptions such as fertility from birth and zero mortality prior to x_d. These assumptions, such as infertility from birth, could be viewed as distinctive features, and it might be worth mentioning that parental care of offspring could have co-evolved with such features. This is particularly relevant considering the energy tradeoff hypothesis that has been postulated.

      Although inspired from results obtained in Drosophila, mice, nematodes and zebrafish, the model is so far haploid and asexual, thus involving individuals likely more similar to unicellulars. In these conditions, infertility from birth did not seem relevant to us. However, the model and codes are accessible online and we hope that others will use it to address such questions. It is interesting though to notice that ageing appears here without such constraint.

      Additionally, the consideration that all organisms face a non-trivial mortality rate at every age, not solely from physiological causes, reflects the reality within which selection operates.

      We thought this was the best way to reflect, an environment with a limited carrying capacity. A more complex model is under construction to take into account the fact that older individuals might be more sensitive to it than younger ones.

      (5) While acknowledging the technical rigour applied by the authors, it is suggested that further attention be given to conducting a comprehensive 'reality check' associated with the chosen parameters, particularly regarding the biological relevance of the results. For instance, the authors argue that offspring of old organisms do not, on average, live similarly to their parents. However, it is noted that studies in the haploid asexual organism yeast, akin to what the authors model (albeit not necessarily yeast), revealed that the average lifespan of yeast progeny born from young or old mothers is very similar.

      We do not claim that progeny of old parents live less long than that of younger parents on average, we say that it happens in the progeny of physiologically old parents, representing at most 10% of the population in our numerical simulations.

      The authors cite experimental evolution in Drosophila progeny conceived later in the life of the parent, indicating that the onset of mortality in these progeny occurs late, sometimes even after the end of the fertility period (Burke et al., 2016; Rose et al., 2002). While the authors report their own previous studies with divergent results, independent experiments have suggested an increase of x_d following an artificial increase of x_b (Luckinbill and Clare, 1985; Sgro et al., 2000). A more in-depth consideration of these contrasting observations and their potential implications for the current model could enhance the overall robustness of the study.

      The increase of x_d following an artificial increase of x_b is predicted by our model as discussed. The divergence of observations between studies is alas hard to assess.

      (6) To enhance readability and maintain consistency, it is suggested that the authors homogenise the description of key parameters, specifically x_b and x_d, throughout the text. This could contribute to improved clarity and rigour. One recommendation is to refer to x_b consistently as the 'fertility span' and x_d as the 'mortality onset' for the sake of uniformity in terminology.

      We have modified the text accordingly.

      (7) At various points in the text, the assertion is made that observations have indicated a tradeoff between fertility and longevity. It is recommended that the authors provide references or data to substantiate this claim. This addition would contribute to the empirical grounding of the mentioned tradeoff and strengthen the overall support for the assertions made in the study.

      We added the following references to the discussion Lemaitre et al., 2015, Kirkwood, 2005 and Rodrigues and Flatt 2016.

      (8) The statement claiming that the model is 'able to describe all types of ageing observed in the wild' should be moderated. As the authors themselves acknowledge, the model is referred to as a 'toy model,' and it is made clear that it cannot capture, nor is intended to capture, the entire diversity observed in life. Adjusting this statement to reflect the limited scope and purpose of the model would enhance precision and accuracy in the presentation of its capabilities.

      Although a toy model, its possible configurations encompass all the possible configurations described so far across the diversity of ageing throughout the tree of life from negligible senescence with no loss of fertility (x_b and x_d >> 0) to menopause-like configurations (x_b >> x_d) through fast mortality increase post reproduction (x_b = x_d). Replacing our current square functions would allow age-dependant decrease or increase of fertility and/or risks of mortality onsets.

      (9) To bolster the biological relevance of the study, it is strongly recommended that the authors cross-check the results of their simulations with previously published experimental findings. This approach would serve to strengthen the alignment between the model outcomes and observed biological phenomena. Additionally, placing greater emphasis on the biological relevance aspects throughout the text would contribute to a more robust and comprehensive exploration of the study's implications.

      In the present manuscript we have tried to cite a certain number of results from artificial selection experiments on life history traits in order to strengthen the interpretations of our model’s behaviour. There are numerous other studies, going in the same direction or not, but we do not think that it would be relevant to add an exhaustive list of them. Nevertheless, we added Stearns et al., 2000 that adds extrinsic high mortality to the evolution of life history traits.

      (1) For enhanced clarity, it is suggested that the x-axis in Figure 1 be labelled as 'age.' Considering this adjustment could contribute to clearer visual communication of the data.

      We agree with the reviewer and modified the figure accordingly.

      (!!) The addition of graphical legends is recommended for Figures 3-5, as well as the supplementary figures. Including these legends would provide essential context and improve the interpretability of the figures for readers.

      We agree with the reviewer and modified the figure accordingly.

      (12) For improved distinction of the ranges indicated by quantiles in Figure 3, it is suggested that the authors consider enhancing visual clarity. One approach could involve making the middle quantile thicker or using a different line type. Additionally, it is recommended to explore the calculation of the highest density 90% intervals rather than the 1-9 deciles. This adjustment could contribute to a clearer representation of the data distribution in the figure.

      We named the different deciles directly on the figure to improve readability.

      (13) It is observed that the mathematical proofs in Annex 1 are not displaying properly in the PDF. Additionally, there seem to be missing and broken references for the Annex. This issue may be related to LaTeX formatting. The authors could consider revisiting the formatting of Annex 1 to ensure the correct display of mathematical proofs and address the referencing concerns, possibly by checking and rectifying any LaTeX-related issues.

      The latex file of the supplementary was not correctly compiled. It is now corrected.

      (14) There is inconsistency in the text regarding the reference to the Annex, with both 'Annex' and 'Annexe' being used interchangeably. To maintain uniformity, it is suggested that the authors consistently use either 'Annex' or 'Annexe' throughout the text. This adjustment would contribute to a more polished presentation of the supplementary material.

      We corrected them accordingly.

      (15)There appears to be a typographical error in the name of Supplementary Figure 3.

      We corrected it accordingly.

    1. Author Response

      eLife assessment

      The authors present evidence that small extracellular vesicles can be secreted from cells inside larger vesicles that they call amphiectosomes, which then tear to release their small vesicle contents. There are questions and concerns relating to the quality of the data and the in vivo significance of the observations. The findings are potentially important but the data are incomplete and the claims are only partially supported.

      We agree that the in vivo significance and details of the molecular background of amphiectosome release remains to be studied further. However, as Reviewer 2 indicated, our data in this Short Report may have a substantial impact on our understanding of EV biogenesis. Therefore, we considered it was important to publish our data as soon as possible because it may significantly impact other EV biogenesis studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      (1) Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometre.

      (1) When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      (2) Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      (3) In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      (4) In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      (5) Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3_S2B).

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2_S4, respectively.

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In the Supplementary figure Figure 2-S4 we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2_S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO4 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2._S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO4 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H2O2 and NaBH4 to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force. This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.

      The conclusions of this preprint are mostly supported by the data except being noted below. Some descriptions need to be revised for accuracy.

      L219-L285: The conclusion that X-linked miR-506 family miRNAs are expanded via LINE1 retrotransposition is not supported by the data. LINE1s and SINEs are very abundant, accounting for nearly 30% of the genome. In addition, the LINE1 content of the mammalian X chromosome is twice that of the autosomes. One can easily find flanking LINE1/SINE repeat. Therefore, the analyses in Fig. 2G, Fig. 2H and Fig. S3 are not informative. In order to claim LINE1-mediated retrotransposition, it is necessary to show the hallmarks of LINE1 retrotransposition, which are only possible for new insertions. The X chromosome is known to be enriched for testis-specific multi-copy genes that are expressed in round spermatids (PMID: 18454149). The conclusion on the LINE1-mediated expansion of miR-506 family on the X chromosome is not supported by the data and does not add additional insights. I think that the LINE1 related figure panels and description (L219-L285) need to be deleted. In discussion (L557558), "...and subsequently underwent sequence divergence via LINE1-mediated retrotransposition during evolution" should also be deleted. This section (L219-L285) needs to deal only with the origin of miR506 from MER91C DNA transposons, which is both convincing and informative.

      Reply: Agreed, the corresponding sentences were deleted.

      Fig. 3A: can you speculate/discuss why the miR-506 expression in sperm is higher than in round spermatids?

      Reply: RNAs are much less abundant in sperm than in somatic or spermatogenic cells (~1/100). Spermborne small RNAs represent a small fraction of total small RNAs expressed in their precursor spermatogenic cells, including spermatocytes and spermatids. Therefore, when the same amount of total/small RNAs are used for quantitative analyses, sperm-borne small RNAs (e.g., miR-506 family miRNAs) would be proportionally enriched in sperm compared to other spermatogenic cells. We discussed this point in the text (Lines 550-556).

      **Reviewer #2 (Public Review):

      In this paper, Wang and collaborators characterize the rapid evolution of the X-linked miR-506 cluster in mammals and characterize the functional reference of depleting a few or most of the miRNAs in the cluster. The authors show that the cluster originated from the MER91C DNA transposon and provide some evidence that it might have expanded through the retrotransposition of adjacent LINE1s. Although the animals depleted of most miRNAs in the cluster show normal sperm parameters, the authors observed a small but significant reduction in litter size. The authors then speculate that the depletion of most miRNAs in the cluster could impair sperm competitiveness in polyandrous mating. Using a successive mating protocol, they show that, indeed, sperm lacking most X-linked miR-506 family members is outcompeted by wild-type sperm. The authors then analyze the evolution of the miR-506 cluster and its predicted targets. They conclude that the main difference between mice and humans is the expansion of the number of target sites per transcript in humans.

      The conclusions of the paper are, in most cases, supported by the data; however, a more precise and indepth analysis would have helped build a more convincing argument in most cases.

      (1) In the abstracts and throughout the manuscript, the authors claim that "... these X-linked miRNA-506 family miRNA [...] have gained more targets [...] " while comparing the human miRNA-506 family to the mouse. An alternative possibility is that the mouse has lost some targets. A proper analysis would entail determining the number of targets in the mouse and human common ancestor.

      Reply: This question alerted us that we did not describe our conclusion accurately, causing confusion for this reviewer. Our data suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis. In other words, mice never lost any targets compared to humans, but per the miR-506 family miRNA tends to target more genes in humans than in mice.

      We revised the text to more accurately report our data. The pertaining text (lines 490-508) now reads: “Furthermore, we analyzed the number of all potential targets of the miR-506 family miRNAs predicted by the aforementioned four algorithms among humans, mice, and rats. The total number of targets for all the X-linked miR-506 family miRNAs among different species did not show significant enrichment in humans (Fig. S9C), suggesting the sheer number of target genes does not increase in humans. We then compared the number of target genes per miRNA. When comparing the number of target genes per miRNA for all the miRNAs (baseline) between humans and mice, we found that on a per miRNA basis, human miRNAs have more targets than murine miRNAs (p<0.05, t-test) (Fig. S9D), consistent with higher biological complexity in humans. This became even more obvious for the X-linked miR-506 family (p<0.05, t-test) (Fig. S9D). In humans, the X-linked miR-506 family, on a per miRNA basis, targets a significantly greater number of genes than the average of all miRNAs combined (p<0.05, t-test) (Fig. S9D). In contrast, in mice, we observed no significant difference in the number of targets per miRNA between X-linked miRNAs and all of the mouse miRNAs combined (mouse baseline) (Fig. S9D). These results suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis.”

      We also changed “have gained” to “have” throughout the text to avoid confusion.

      (2) The authors claim that the miRNA cluster expanded through L1 retrotransposition. However, the possibility of an early expansion of the cluster before the divergence of the species while the MER91C DNA transposon was active was not evaluated. Although L1 likely contributed to the diversity within mammals, the generalization may not apply to all species. For example, SINEs are closer on average than L1s to the miRNAs in the SmiR subcluster in humans and dogs, and the horse SmiR subcluster seems to have expanded by a TE-independent mechanism.

      Reply: Agreed. We deleted the data mentioned by this reviewer.

      (3) Some results are difficult to reconcile and would have benefited from further discussion. The miR-465 sKO has over two thousand differentially expressed transcripts and no apparent phenotype. Also, the authors show a sharp downregulation of CRISP1 at the RNA and protein level in the mouse. However, most miRNAs of the cluster increase the expression of Crisp1 on a reporter assay. The only one with a negative impact has a very mild effect. miRNAs are typically associated with target repression; however, most of the miRNAs analyzed in this study activate transcript expression.

      Reply: Both mRNA and protein levels of Crisp1 were downregulated in KO mice, and these results are consistent with the luciferase data showing overexpression of these miRNAs upregulated the Crisp1 3’UTR luciferase activity. We agree that miRNAs usually repress target gene expression. However, numerous studies have also shown that some miRNAs, such as human miR-369-3, Let-7, and miR-373, mouse miR-34/449 and the miR-506 family, and the synthetic miRNA miRcxcr4, activate gene expression both in vitro (1, 2) and in vivo (3-6). Earlier reports have shown that these miRNAs can upregulate their target gene expression, either by recruiting FXR1, targeting promoters, or sequestering RNA subcellular locations (1, 2, 6). We briefly discussed this in the text (Lines 605-611).

      (4) More information is required to interpret the results of the differential RNA targeting by the murine and human miRNA-506 family. The materials and methods section needs to explain how the authors select their putative targets. In the text, they mention the use of four different prediction programs. Are they considering all sites predicted by any method, all sites predicted simultaneously by all methods, or something in between? Also, what are they considering as a "shared target" between mice and humans? Is it a mRNA that any miR-506 family member is targeting? Is it a mRNA targeted by the same miRNA in both species? Does the targeting need to occur in the same position determined by aligning the different 3'UTRs?

      Reply: Since each prediction method has its merit, we included all putative targets predicted by any of the four methods. The "shared target" refers to a mRNA that any miR-506 family member targets because the miR-506 family is highly divergent among different species. We have added the information to the “Large and small RNA-seq data analysis” section in Materials and Methods (Lines 871-882).

      (5) The authors highlight the particular evolution of the cluster derived from a transposable element. Given the tendency of transposable elements to be expressed in germ cells, the family might have originated to repress the expression of the elements while still active but then remained to control the expression of the genes where the element had been inserted. The authors did not evaluate the expression of transcripts containing the transposable element or discuss this possibility. The authors proposed an expansion of the target sites in humans. However, whether this expansion was associated with the expansion of the TE in humans was not discussed either. Clarifying whether the transposable element was still active after the divergence of the mouse and human lineages would have been informative to address this outstanding issue.

      Reply: Agreed. The MER91C DNA transposon is denoted as nonautonomous (7); however, whether it was active during the divergence of mouse and human lineages is unknown. To determine whether the expansion of the target sites in humans was due to the expansion of the MER91C DNA transposon, we analyzed the MER91C DNA transposon-containing transcripts and associated them with our DETs. Of interest, 28 human and 3 mouse mRNAs possess 3’UTRs containing MER91C DNA sequences, and only 3 and 0 out of those 28 and 3 genes belonged to DETs in humans and mice, respectively (Fig. S9E), suggesting a minimal effect of MER91C DNA transposon expansion on the number of target sites. We briefly discussed this in the text (Lines 511-518).

      Post-transcriptional regulation is exceptionally complex in male haploid cells, and the functional relevance of many regulatory pathways remains unclear. This manuscript, together with recent findings on the role of piRNA clusters, starts to clarify the nature of the selective pressure that shapes the evolution of small RNA pathways in the male germ line.

      Reply: Agreed. We appreciate your insightful comments.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.

      Strengths:

      This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.

      Weaknesses:

      The authors specifically addressed the function of 5 clusters of X-link miR-506 family containing 19 miRNAs. There is another small cluster containing 3 miRNAs close to the Fmr1 locus. Would this small cluster act in concert with the 5 clusters to regulate spermatogenesis? In addition, any autosomal miR-506 like miRNAs may compensate for the loss of X-linked miR-506 family. These possibilities should be discussed.

      Reply: The three FmiRs were not deleted in this study because the SmiRs are much more abundant than the FmiRs in WT mice (Author Response image 1, heatmap version of Fig. 5C). Based on small RNA-seq, some FmiRs, e.g., miR-201 and miR-547, were upregulated in the SmiRs KO mice, suggesting that this small cluster may act in concert with the other 5 clusters and thus, worth further investigation. To our best knowledge, all the miR-506 family miRNAs are located on the X chromosome, although some other miRNAs were upregulated in the KO mice, they don’t belong to the miR-506 family. We briefly discussed this point in the text (Lines 635-638).

      Author response image 1.

      sRNA-seq of WT and miR-506 family KO testis samples.

      Direct molecular link to sperm competitiveness defect remains unclear but is difficult to address.

      Reply: In this study, we identified a target of the miR-506 family, i.e. Crisp1. KO of Crisp1 in mice, or inhibition of CRISP1 in human sperm (7, 8), appears to phenocopy the quinKO mice, displaying largely normal sperm motility but compromised ability to penetrate eggs. The detailed mechanism warrants further investigation in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 84-85: "Several cellular events are unique to the male germ cells, e.g., meiosis, genetic recombination, and haploid male germ cell differentiation (also called spermiogenesis)". This statement is not accurate. Please revise. Meiosis and genetic recombination are common to both male and female germ cells. They are highly conserved in both sexes in many species including mouse.

      Reply: Agreed. We have revised the sentence and it now reads: “Several cellular events are unique to the male germ cells, e.g., postnatal formation of the adult male germline stem cells (i.e., spermatogonia stem cells), pubertal onset of meiosis, and haploid male germ cell differentiation (also called spermiogenesis) (9)” (Lines 83-86).

      Lines 163-164: "we found that Slitrk2 and Fmr1 were syntenically linked to autosomes in zebrafish and birds (Fig. 1A), but had migrated onto the X chromosome in most mammals". This description is not accurate. Chr 4 in zebrafish and birds is syntenic to the X chromosome in mammals. The term "migrated" is not appropriate. Suggestion: Slitrk2 and Fmr1 mapped to Chr 4 (syntenic with mammalian X chromosome) in zebrafish and birds but to the X chromosome in most mammals.

      Reply: Agreed. Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the significance statement, the authors mention that the mutants are "functionally infertile," although the decrease in competitiveness is partial. I suggest referring to them as "functionally sub-fertile."

      Reply: Agreed. Revised as suggested.

      (2) I will urge the authors to explain in more detail how some figures are generated and what they mean. Some critical information needs to be included in various panels.

      (2a) Figure S1. The phastCons track does not seem to align as expected with the rest of the figure. The highest conservation peak is only present in humans, and the sequence conserved in the sea turtle has the lowest phastCons score. I was expecting the opposite from the explanation.

      Reply: The tracks for phyloP and phastCons are the scores for all 100 species, whereas the tracks with the species names on the left are the corresponding sequences aligned to the human genome. We have revised our figure to make it clearer.

      (2b) Figure 2A and Figure S2C. Although all the functional analysis of the manuscript has been done in mice, the alignments showing sequence conservation do not include the murine miRNAs. Please include the mouse miRNAs in these panels.

      Reply: The mouse has Mir-506-P7 with the conserved miRNA-3P seed region, which was included in the lower panel in Figure S2C. However, mice do not have Mir-506-P6, which may have been lost or too divergent to be recognized during the evolution and thus, were not included in Figure 2A and the upper panel in Figure S2C.

      (2c) Figure S7H. The panel could be easier to read.

      Reply: Agreed. We combined all the same groups and turned Figure S7H (now Figure S6H) into a heatmap.

      (2d) The legend of Figure 6G reads, "The number of target sites within individual target mRNAs in both humans and mice ." Can the author explain why the value 1 of the human "Number of target sites" is connected to virtually all the "Number of target sites" values in mice?

      Reply: Sorry for the confusion. For example, for gene 1, we have 1 target site in the human and 1 target site in the mouse; but for gene 2, we have 1 target site in the human and multiple sites in the mouse; therefore, the value 1 is connected to more than one value in the mouse.

      Reviewer #3 (Recommendations For The Authors):

      CRISP1 and EGR1 protein localization in WT and mutant sperm by immunostaining would be helpful.

      Reply: Agreed. We performed immunostaining for CRISP1 on WT sperm, and the new results are presented in Figure S8D. CRISP1 seems mainly expressed in the principal piece and head of sperm.

      The detailed description of the generation of various mutant lines should be included in the Methods.

      Reply: We added more details on the generation of knockout lines in the Materials and Methods (686701).

      References:

      (1) S. Vasudevan, Y. Tong, J. A. Steitz, Switching from repression to activation: microRNAs can upregulate translation. Science 318, 1931-1934 (2007).

      (2) R. F. Place, L. C. Li, D. Pookot, E. J. Noonan, R. Dahiya, MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc Natl Acad Sci U S A 105, 1608-1613 (2008).

      (3) Z. Wang et al., X-linked miR-506 family miRNAs promote FMRP expression in mouse spermatogonia. EMBO Rep 21, e49024 (2020).

      (4) S. Yuan et al., Motile cilia of the male reproductive system require miR-34/miR-449 for development and function to generate luminal turbulence. Proc Natl Acad Sci U S A 116, 35843593 (2019).

      (5) S. Yuan et al., Oviductal motile cilia are essential for oocyte pickup but dispensable for sperm and embryo transport. Proc Natl Acad Sci U S A 118 (2021).

      (6) M. Guo et al., Uncoupling transcription and translation through miRNA-dependent poly(A) length control in haploid male germ cells. Development 149 (2022).

      (7) V. G. Da Ros et al., Impaired sperm fertilizing ability in mice lacking Cysteine-RIch Secretory Protein 1 (CRISP1). Dev Biol 320, 12-18 (2008).

      (8) J. A. Maldera et al., Human fertilization: epididymal hCRISP1 mediates sperm-zona pellucida binding through its interaction with ZP3. Mol Hum Reprod 20, 341-349 (2014).

      (9) L. Hermo, R. M. Pelletier, D. G. Cyr, C. E. Smith, Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73, 241-278 (2010).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present study by Berger et al. analyzes to what extent memory formation is dependent on available energy reserves. This has been dealt with extensively in the case of aversive memory formation, but only very sparsely in the case of appetitive memory formation. It has long been known that an appetitive memory in flies can only be formed by starvation. However, the authors here additionally show that not only the duration of starvation plays a role, but also determines which form of memory (short- or long-term memory) is formed. The authors demonstrated that internal glycogen stores play a role in this process and that this is achieved through insulin-like signaling in octopaminergic reward neurons that integrates internal energy stores into memory formation. Here, the authors suggest that octopamine plays a role as a negative regulator of different forms of memory.

      The study sheds light on an old question, to what extent the octopaminergic neuronal system plays a role in the formation of appetitive memory, since in recent years only the dopaminergic system has been in focus. Furthermore, the data are an interesting contribution to the ongoing debate whether insulin receptors play a role in neurons themselves or in glial cells. The experiments are very well designed and the authors used a variety of behavioural experiments, genetic tools to manipulate neuronal activity and state-of-the-art imaging techniques. In addition, they not only clearly demonstrated that octopamine is a negative regulator of appetitive memory formation, but also proposed a mechanism by which the insulin receptor in octopaminergic neurons senses the internal energy status and then controls the activity of those neurons. The conclusions are mostly supported by the data, but some aspects related to the experimental design, some explanations and literature references need more clarification and revision.

      (1) Usually, long-term memory (LTM) is tested 24 hours after training. Here, the authors usually refer to LTM as a memory that is tested 6 hours after training. The addition of a control experiment to show that LTM that the authors observe here lasts longer would increase the power of this study immensely.

      We thank the reviewer for this comment, as it helped greatly to clarify the matter.

      We measured memory of control and mutant flies 24 h after the training and included the data into the manuscript (Figure 1B and summarized in a model in Figure 2C). We show that control flies develop an intermediate type of memory, that is depending on the length of starvation either anesthesia-sensitive or resistant. Mutants lacking octopamine develop either anesthesia-sensitive or resistant long-term memory.

      (2) The authors define here another consolidated memory component as ARM, when they applied a cold-shock 2 hours after training. However, some publications showed that LTM is formed after only one training cycle (Krashes et al 2008, Tempel et al 1983). This makes it difficult to determine, whether appetitive ARM can be formed. Furthermore, one study showed that appetitive ARM is absent after massed training (Colomb et al 2009). Therefore, the conclusion could be also, that different starvation protocols, would lead to different stabilities of LTM. Therefore, additional experiments could help to clarify this opposing explanation. From these results, it can then be concluded either that different stable forms of LTM are formed depending on the starvation state, or that two differently consolidated memory phases (LTM, ARM) are formed, as has already been shown for aversive memory. This is also important for other statements in the manuscript, and therefore the authors should address this. For example, the findings about the insulin receptor (is it two opposing memories or different stabilities of LTM).

      The flies indeed develop different types of memory depending on the length of starvation and the internal energy supply.

      Reviewer #2 (Public Review):

      How organism physiological state modulates establishment and perdurance of memories is a timely question that the authors aimed at addressing by studying the interplay between energy homeostasis and food-related conditioning in Drosophila. Specifically, they studied how starvation modulates the establishment of short-term vs long-term memories and clarified the role of the monoamine Octopamine in food-related conditioning, showing that it is not per se involved in formation of appetitive short-term memories but rather gates memory formation by suppressing LTM when energy levels are high. This work clarifies previously described phenotypes and provides insight about interconnections between energy levels, feeding and formation of short-term and long-term food-related memories. In the absence of population-specific manipulation of octopamine signaling, it however does not reach a circuit-level understanding of how these different processes are integrated.

      Strengths

      • Previous studies have documented the impact of Octopamine on different aspects of food-related behaviors (regulation of energy homeostasis, feeding, sugar sensing, appetitive memory...), but we currently lack a clear understanding of how these different functions are interconnected. The authors have used a variety of experimental approaches to systematically test the impact of internal energy levels in establishment of appetitive memory and the role of Octopamine in this process.

      • The authors have used a range of approaches, performed carefully controlled experiments and produced high quality data.

      Weaknesses

      (1) In the tbh mutant flies, Tyramine -to- Octopamine conversion is inhibited, resulting not only in a lack of Octopamine, but also in elevated levels of Tyramine. If and how elevated levels of Tyramine contributes to the described phenotypes is unclear. In the current version of the manuscript, only one set of experiments (Figure 2) has been performed using Octopamine agonist. This is particularly important in light of recent published data showing that starvation modifies Tyramine levels. (2) Octopamine (and its precursor Tyramine) have been implicated in numerous processes, complicating the analysis of the phenotypes resulting from a general inhibition of tbh.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increase in octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (3) The manuscript explores various aspects of the impact of energy levels on food-related behaviors and the underlying sensing and effector mechanism, both in wild-type and tbh mutants, making it difficult to follow the flow of the results.

      We included models illustrating the results to clarify the content of the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Berger et al. study how internal energy storage influence learning and memory. Since in Drosophila melanogaster, octopamine (OA) is involved in the regulation of energy homeostasis they focus on the roles of OA. To do so they use the tyramine-β-hydroxylase (Tbh) mutant that is lacking the neurotransmitter OA and study short term memory (STM), long-term memory (LTM) and anesthesia-resistant memory (ARM). They show that the duration of starvation affects the magnitude of both short- and long-term memory. In addition, they show that OA has a suppressive effect on learning and memory. In terms of energy storage, they show that internal glycogen storage influences how long sucrose is remembered and high glycogen suppresses memory. Finally, they show that insulin-like signaling in octopaminergic neurons, which is also related to internal energy storage, suppresses learning and memory.

      This is an important study that extends our knowledge on OA activity in learning and memory and the effects the metabolic state has on learning and memory. The authors nicely use the genetic tools available in flies to try and unravel the complex circuitry of metabolic state level, OA activity and learning and memory.

      Nevertheless, I do have some comments that I think require attention:

      (1) The authors use RNAi to reduce the level of glycogen synthase or glycogen phosphorylase. These manipulations are expected to affect the level of glycogen. Using specific drivers the authors attempt to manipulate glycogen level at the muscles and fat bodies and examine how this affects learning and memory. The conclusions of the authors arise solely from the manipulation intended (i.e. the genetics). However, the authors also directly measured glycogen levels at these organs and those do not follow the manipulation intended, i.e. the RNAi had very limited effect on the glycogen level. Nevertheless, these results are ignored.

      We agreed with the reviewer and repeated the experiments. While we could not detect differences in whole animals, we detected differences in tissues enriched for muscles or fat, e.g. thorax or abdomen. We added the data.

      (2) The authors claim in the summary that OA is not required for STM. However, according to one experiment OA is required for STM as Tbh mutants cannot form STM. In another experiment OA is suppressive to STM as wt flies fed with OA cannot form STM. Therefore, it is very difficult to appreciate the actual role of OA on STM.

      During mild starvation, the internal energy supply is greater in Tbh mutants than in control flies. This information is integrated into the reward system via insulin receptor signaling. Therefore, the association between the odorant and sucrose is not meaningful to the mutants and no STM is formed. At the same time there is no release of octopamine and therefore no repression of LTM. In starved animals, octopamine suppresses food intake (we added the data). This is consistent with a function of Octopamine as a signal for the presence of food. Depending on when the signal comes, this might suppress the formation of STM or LTM.

      (3) The authors use t-test and ANOVA for most of the statistics, however, they did not perform normality tests. While I am quite sure that most datasets will pass normality test, nevertheless, this is required.

      Thanks for pointing this out. We have included a description in the “Materials and Methods” section that explains how we tested the data for normal distribution. We corrected the figure legends accordingly.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. “

      (4) While it is logical to assume that OA neurons are upstream to R15A04 DA neurons, I am not sure this really arises from the experiment that is presented here. It is well established that without activity in R15A04 DA neurons there is no LTM. Since OA acts to decrease LTM, can one really conclude anything about the location of OA effect when there is no learning?

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant.

      (5) It is unclear how expression of a dominant negative form of insulin receptor (InR) in OA neurons can rescue the lack of OA due to the Tbh mutation. If OA neurons cannot release anything to the presumably downstream DA neurons, how can changing their internal signaling has any effect?

      The expression of the dominant negative form of the insulin receptor signals no food or low energy levels and activation of the insulin receptor that there is enough food. The reward is a source of food, but the energy content is not high enough to fill the energy stores. The insulin receptor activation can activate at least three different signaling cascades, one of which might regulate octopamine release.

      While I stressed some comments that need to be addressed, the overall take-home message of the manuscript is supported and the authors do show that the metabolic state of the animal affects learning and memory. I do think though, that some more caution is required for some of the conclusions.

      We added additional data to address the points raised.

      Recommendations for the authors:

      We addressed all points raised by the reviewers, clarified the content or added more data.

      Reviewer #1 (Recommendations For The Authors):

      (1) Throughout the manuscript, the full stop of a sentence is always placed before the references.

      We fixed this.

      (2) I find the English in the manuscript not yet sufficient for publication. I suggest that the authors carefully revise the manuscript. I think if the sentences are structured a little more clearly, this paper has enormous potential to be read by your broad community.

      We agree and revised the manuscript. We hope the manuscript is now clearer.

      (3) Sentences l114 to l117 are misleading. The authors imply that they tested the same flies for changes in odor perception or sucrose sensitivity. I assume that the authors meant that they analyzed different groups of animals.

      We clarified the sentence as follows:

      “To ensure that the observed differences in learning and memory were not due to changes in odorant perception, odorant evaluation or sucrose sensitivity, different fly populations of the same genotypes were tested for their odorant acuity, odorant preference and their sucrose responsiveness (Table S1).”

      (4) In the title as well as in the abstract the influence of octopamine on appetitive memory formation is described in more detail, this is also the main focus of this study. However, in the introduction, the influence of the insulin receptor on memory formation is discussed first. Personally, I would describe this later in the manuscript, ideally in the results section. At this point in the manuscript, this leads to an interruption in the flow of reading.

      Thanks for the suggestion. We changed the order in the introduction.

      (5) The authors could consider, since they only used Drosophila melanogaster, changing "Drosophila melanogaster" to "Drosophila" throughout the manuscript.

      We modified the text accordingly.

      (6) All evaluations and statistical tests are state of the art. However, I have one comment. For each statistical test, a correction should be made depending on the number of tests. However, I could not determine whether this was also done for the parametric or non-parametric one-sample t-test. From the results and the methods section, I would guess not. Here I would recommend a Bonferroni correction or even better a Sidak-Holm correction. Furthermore, the authors could also go into more detail about which non-parametric one-sample t-test they used.

      We described the statistic used in more detail in the material and method section.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. For normal distributed data, we used the Student’s t test to compare differences between two groups and the one-way ANOVA with Tukey’s post hoc HSD test for differences between more than two groups. For nonparametric data, we used the Mann-Whitney U test for differences between two groups and for more than two groups the Kruskal-Wallis test with post hoc Duenn analysis and Bonferroni correction. The nonparametric one-sample sign test was used to analyze whether behavior was not based on random choice and differed from zero (P < 0.5). The statistical data analysis was performed using statskingdom (https://www.statskingdom.com).”

      (7) In nearly all figure legends the sentence "The letter "a" marks a significant difference from random choice as determined by a one-sample sign test (P < 0.05; P< 0.01)" occur. This is correctly indexed in the figures. However, I do not understand here what then P < 0.05; P**< 0.01 means. The significance level should be described here. I would strongly recommend the authors to make the definition clearer.

      We corrected this in the figure legends (see also above).

      (8) In Fig. 1B the labelling is a bit confusing. I interpret the two right groups as the mutants for octopamine, but there is still w[1118] in front.

      We modified the Figure 1B.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions

      (1) Assessing the contribution of Tyramine in the observed phenotypes (for example by reducing the levels of Tyramine or its specific receptor) would help understand the contribution of Tyramine in the observed phenotypes.

      See comments above.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increased octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (2) Cell-specific inhibition of octopamine receptors should thus be performed to precisely interpret the observed phenotypes and dissect how interconnected the different phenotypes are, which is the object of this publication.

      We observed that the time point and duration of octopamine application changes the behavioral output. The behavior analyzed depends on pulses of octopamine and differences of the internal energy status. A cell-specific inhibition via RNAi knock down of octopamine receptors might not clarify the issue.

      (3) Defining of streamline and progressively integrating the different observations into a unifying model would improve the clarity and flow of the manuscript.

      We included models explaining the observed results (Figure 2C and Figure 7E).

      Minor comments

      Line 129: Figure 1B should be mentioned, not 2B.

      Figure 1 legend: E should be replaced by C (after A,B).

      Figure S5: what are the arrows pointing to? Why are the Inr foci visible in A not seen in B? It should be mCD8-GFP and not mCD on top of the images.

      We fixed this.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) Can one really conclude from Figure 2A that OA acts on R15A04 DA neurons? It is well established that without activity in these DA neurons there is no LTM. Since OA acts to decrease learning, how one can conclude anything about the location of OA effect when there is no learning? With STM the situation was opposite, OA supported learning and this was abolished when DA neurons were silenced. I think some supporting experiment are required, i.e. how OA affects DA neurons activity or, alternatively, tone down a bit the writing.

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant. The inhibition of dopaminergic neurons blocks the memory of Tbh mutants. Taken together the duration of the memory, the cold-shock experiments and the inhibition of the dopaminergic neurons, Tbh develops LTM after training. This training does not evoke memory in controls.

      The loss of STM in mildly starved Tbh mutants depends on the integration of the high internal energy levels via InR signaling. Reducing the internal energy levels further by extension of starvation result in STM supporting that OA is not directly involved in the formation of STM.

      (2) Figure 4 requires some clarifications. In Supplementary Figure S2 the authors show that they could not manipulate glycogen levels in muscles. However, in Figure 4B they show that "Increasing glycogen levels in the muscles did not change short-term memory in 16 h starved flies, but the reduction in glycogen significantly improved memory strength (Figure 4B)" (lines 231-233). How can this be reconciled?

      While we could not detect differences in whole animals, we detected differences in glycogen content in body parts enriched with muscles or fat, e.g. thorax or abdomen when using UAS-GlyP-RNAi or UAS-GlyS-RNAi under the control of the respective Gal4 drivers.

      We added the data.

      Likewise, the authors write that "Increasing or decreasing glycogen levels in the fat bodies had no effect on memory performance (Figure 4C)" Line (233-234). However, in Figure S2 they show that they can only increase glycogen levels but not decrease them.

      As explained above the conclusion of Figure 4 "Thus, low levels of glycogen in the muscles upon starvation positively influence appetitive short-term memory, while high levels of glycogen in the muscles and fat body reduce short-term memory" lines 245-246, is not supported by the direct measurements of glycogen presented in Figure S2.

      We added the data showing that the reduction or increase can be measured when analyzing the specific body parts enriched in muscles tissue or fat tissue.

      (3) In cases where mutant flies do not display learning, a control should be done to see if they ate the sugar (with dye). Especially since the genetic manipulation affects metabolism.

      We analyzed how much sucrose the animals consumed in the behavioral test. Tbh and controls fed and there was no difference in feeding behavior between the mutants and the controls.

      “We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies. “

      (4) The use of t-test requires the data to be normally distributed. If I am not mistaken this was not demonstrated for any of the datasets used. I did a quick check on one of the datasets provided in the excel sheet and it is normally distributed. Therefore, please add normality test for all data sets. If some do not pass normality, please use a suitable non-parametric test.

      We added normality test to all data sets and used non-parametric tests for non-normal distributed data. We clarify this in the material and method section and the figure legends.

      (5) The authors show that OA suppresses also STM. This result is in contradiction to previous published results. This by itself is not a problem. However, this result also seems to me in contradiction to the authors own results. According to Figure 1B, OA is required for STM as it absence in the tbh mutant results in loss of STM. According to Figure 2C, OA is reducing STM as wt flies fed with OA just prior to learning do not form STM. This appears in other places in the manuscript as well.

      In addition, in the text lines 178-180, the authors write "A short pulse of octopamine before the training inhibits the STM. Thus, octopamine is a negative regulator of appetitive dopaminergic neuron-dependent long-term memory and can block STM." But in the summary they write "Octopamine is not required for short-term memory, since octopamine deficient mutants form appetitive short-term memory to sucrose and to other nutrients depending on the internal energy status." So, the take-home message regarding OA and STM is unclear.

      The authors need to better clarify this point.

      We clarified these points. See comments above. The loss of memory in Tbh mutants is not due to loss of octopamine, but increased energy levels that changes the reward properties of sucrose.

      (6) The manuscript is very difficult to follow. The authors constantly change between 16 and 40 hours starvation, short term memory, 3 hour memory and 6 hour memory. I think it would have been better to have a more focused manuscript. However, if this is not possible, I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other. Also, perhaps add to each figure a panel describing exactly the experimental conditions. I think also simplifying the text and adding more conclusions throughout the results section will help the readers to follow. Finally, I think that it would help understanding the conclusions if the authors can add a diagram of the flow that they think occurs. For example, the authors show that glycogen suppresses learning as its reduction increases learning. They also show that InR activity receptor suppresses learning as its KD also increases learning. If I am not mistaken the link between the two is not straight forward (but I may be wrong here). A diagram of the flow would be very helpful.

      We prepared diagrams summarizing and explaining the results.

      Minor

      (1) I may not have understood correctly as I am not sure that I found Table S1.

      Also, there was no legend for Table S1.

      Nevertheless, if I understood correctly, the authors write that "Before the experiments, flies were tested to determine whether they perceived the odorants, preferred one odorant over other and responded to the reward similarly to ensure that the observed differences in behavior were not due to changes in odorant perception or sucrose sensitivity (Table S1)." However, according to the Table that I found it seems that following 40h starvation wt flies show preference to OCT whereas this does not occur for the mutant. Also, it seems that at 16h the mutant has a much higher preference to the odors than after 40h. This is a bit odd. I am also not sure what the balance value refers to. Finally, the mutant shows really low 2M sucrose preference after 40h. In general, this set of experiments requires a bit more explanation.

      I think it is better to show these experiments using graphs and add this to the supplementary figures.

      We clarified the experiments in the result section as follows and added an explanation to the material and method section. We tested the odorant acuity and sucrose preference for all genotypes used in the manuscript and added the data to the Table S1.

      “The flies of the different genotypes sensed the odorants and evaluated them as similar salient in comparison. This is important to a avoid a bias in the situation where flies have to choose between the two odorants after training. They also sensed sucrose. We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies.”

      (2) Line 129 should be Figure 1B

      Is corrected.

      (3) Line 133, Figure 1C, how can one explain the negative reinforcement? I can understand no reinforcement, but negative?

      The effect of glucose might be doses dependent. 0.15 M sucrose is a much closer to a realistic concentration found in fruits than 2 M sucrose and might therefore elicit aversion. When animals are starved enough they might find any food source attractive, even when the concentrations of sucrose is unrealistic.

      (4) Figure 1, why are the graphs different between panel B and C?

      Is corrected.

      (5) In Figure S1, are the TβhnM18 groups differ significantly from zero? I think they are, so better to state this somewhere. If not, the claims in lines 134-135 are not supported by the data.

      We added the significance and added the data to Figure 1.

      Figure S1 legend: there is no A panel. Also "below box blots" should be box plots.

      Thanks for pointing that out. We corrected it.

      (6) It is not clear what is the duration of starvation used in Figure 2A. I assume that 16h and sucrose 2M used were used, but I would state that explicitly.

      We added the information to the figure legends.

      (7) Figure 2A is missing a control of flies with both the driver and UAS shibirets at the permissive temperature.

      We added the controls to the supplement (Figure S1).

      (8) It seems to me that Figure 3B, in which the author state that "Only after 40 h of starvation did TβhnM18 mutants show a similar preference to control sucrose consumption" (line 198) is somewhat in contradiction to Table S1 in which I see Sucrose preference for wt 0.36 and for tbh 0.17. I think this comment arise because I did not understand Table S1 correctly, so please better explain.

      We rewrote this section.

      (9) In Figure 3C, consider not using std as this stands for standard deviation and may be confusing.

      We now use the term “food” instead of “std” and explained in the legend that food means standard fly food.

      We fixed this.

      (10) Please check the Supplementary Figures. I think Figures S2 and S3 are switched.

      We fixed this.

      (11) There is a mistake in Figure S3A. The right column should have another "+" sign.

      Thanks, we fixed this.

      (12) I am somewhat puzzled by Figures 4 and 5. If I understand correctly figure 4B w1118 mef2-G4 is exactly the same experiment as Figure 5A w1118 mef2-G4 and yet in Figure 4B performance index is 0.2 and in Figure 5A about 0.4. According to other comparisons it seems to me that these will be significantly different and yet it is the same experiment.

      They are two independent experiments done at different times. The controls were independently repeated.

      (13) Line 273 should be Figure 5C.

      Is corrected.

      (14) I don't think this is a correct sentence "Virgin females remembered sucrose significantly better than mated females." Line 274.

      Reads now:

      “Virgin females remembered the odorant paired with sucrose significantly better than mated females.”

      (15) Line 340 there is no Figure 1E

      Is fixed (1 C)

      (16) The data excel file is difficult to follow. In Figure 2 there are references to Figure 5. The graphs are pointing to other files. Text is not always in English. It is not clear what W stands for. I recommend making it more accessible.

      We corrected the data excel files.

      (17) The manuscript is difficult to follow. I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other.

      We improved the data presentation by

      a) adding a model showing the kinetics of memory formation in controls and mutants (Figure 2C)

      b) a model explaining how the internal state is integrated into the formation of memory (Figure 7D).

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study reports important evidence that infants' internal factors guide children's attention and that caregivers respond to infants' attentional shifts during caregiver-infant interactions. The authors analyzed EEG data and multiple types of behaviors using solid methodologies that can guide future studies of neural responses during social interaction in infants. However, the analysis is incomplete, as several methodological choices need more adequate justification.

      Reviewer #1

      Public Review:

      The authors bring together multiple study methods (brain recordings with EEG and behavioral coding of infant and caregiver looking, and caregiver vocal changes) to understand social processes involved in infant attention. They test different hypotheses on whether caregivers scaffold attention by structuring a child's behavior, versus whether the child's attention is guided by internal factors and caregivers then respond to infants' attentional shifts. They conclude that internal processes (as measured by brain activation preceding looking) control infants' attention, and that caregivers rapidly modify their behaviors in response to changes in infant attention.

      The study is meticulously documented, with cutting-edge analytic approaches to testing alternative models; this type of work provides a careful and well-documented guide for how to conduct studies and process and analyze data for researchers in the relatively new area of neural response in infants in social contexts.

      We are very pleased that R1 considers our work an important contribution to this developing field, and we hope that we have now addressed their concerns below.

      Some concerns arise around the use of terms (for example, an infant may "look" at an object, but that does not mean the infant is actually "attending); collapsing of different types of looks (to people and objects), and the averaging of data across infants that may mask some of the individual patterns.

      We thank the reviewer for this feedback and their related comments below, and we feel that our manuscript is much stronger as a result of the changes we have made. Please see blow for a detailed description of our rationale for defining and analysing the attention data, as well as the textual changes made in response to the author’s comments.

      Recommendations For The Authors

      This paper is rigorous in method, theoretically grounded, and makes an important contribution to understanding processes of infant attention, brain activity, and the reciprocal temporal features of caregiver-infant interactions. The alternative hypothesis approach sets up the questions well (although authors should temper any wording that suggests attention processes are one or the other. That is, certain bouts of infant attention can be guided by exogenous factors such as social input, and others be endogenous; so averaging across all bouts can actually mask the variation in these patterns). I appreciated the focus on multiple types of behavior (e.g., gaze, vocal fluctuations in maternal speech); the emphasis on contingent responding; and the very clear summaries of takeaways after each section. Furthermore, methods and analyses are well described, details on data processing and so on are very thorough, and visualizations aptly facilitate data interpretation. However, I am not an expert on infant neural responses in EEG and assume that a reviewer with such expertise will weigh in on the treatment and quality of the data; therefore, my comments should be interpreted in light of this lack of knowledge.

      We thank R1 for these very positive and insightful comments on our analyses which are the result of a number of years of methodological and technical developmental work.

      We do agree with R1 that we should more carefully word parts of our argument in the Introduction to make clear the fact that shifts in infant attention could be driven by a combination of interactive and endogenous influences. As a result of this comment, we have made direct changes to parts of the Introduction; removing any wording that suggests that these processes are ‘alternative’ or ‘separate’, and our overall aim states: ‘Here, recording EEG from infants during naturalistic interactions with their caregiver, we examined the (inter)-dependent influences of infants’ endogenous oscillatory neural activity, and inter-dyadic behavioural contingencies in organising infant attention’.

      Examining variability between infant attention episodes in the factors that influence the length and timing of the attention episode is an important area for future investigation. We now include a discussion on this on page 38 of the Discussion section, with suggestions for how this could be examined. Investigating different subtypes of infant attention is methodologically challenging, given the number of infant behaviours that would need to inform such an analysis- all of which are time consuming to code. Developing automated methods for performing these kinds of analyses is an important avenue for future work.

      Here, I review various issues that require revision or elaboration based on my reading of what I consider to otherwise be a solid and important research paper.

      Problem in the use of the term attention scaffolding. Although there may be literature precedent in the use of this term, it is problematic to narrowly define scaffolding as mother-initiated guidance of attention. A mother who responds to infant behaviors, but expands on the topic or supports continued attention, and so on, is scaffolding learning to a higher level. I would think about a different term because it currently implies a caregiver as either scaffolding OR responding contingently. It is not an either-or situation in conceptual meaning. In fact, research on social contingency (or contingent responsiveness), often views the follow-in responding as a way to scaffold learning in an infant.

      Yes, we agree with R1 that the term ‘attention scaffolding’ could be confusing given the use of this term in previous work conducted with children and their caregivers in problem-solving tasks, that emphasise modulations in caregiver behaviour as a function of infant behaviour. As a result of this suggestion, we have made direct edits to the text throughout, replacing the term attentional scaffold with terms such as ‘organise’ and ‘structure’ in relation to the caregiver-leading or ‘didactic’ perspective, and terms such as ‘contingent responding’ and ‘dynamic modulation’ in relation to the caregiver-following perspective. We feel that this has much improved the clarity of the argument in the Introduction and Discussion sections.

      Do individual data support the group average trends? My concern with unobservable (by definition) is that EEG data averages may mask what's going on in individual brain response. Effects appear to be small as well, which occurs in such conditions of averaging across perhaps very variable response patterns. In the interest of full transparency and open science, how many infants show the type of pattern revealed by the average graph (e.g., do neural markers of infant engagement forward predict attention for all babies? Majority?). Non-parametric tests on how many babies show a claimed pattern would offer the litmus test of significance on whether the phenomenon is robust across infants or pulled by a few infants with certain patterns of data. Ditto for all data. This would bolster my confidence in the summaries of what is going on in the infant brain. (The same applies as I suggest to attention bouts. To what extent does the forward-predict or backward-predict pattern work for all bouts, only some bouts, etc.?). I recognize that to obtain power, summaries are needed across infants and bouts, but I want to know if what's being observed is systematic.

      We thank R1 for this comment and understand their concern that the overall pattern of findings reported in relation to the infants’ EEG data might obscure inter-individual variability in the associations between attention and theta power. Averaging across individual participant EEG responses is, however, the gold standard way to perform both event-locked (Jones et al., 2020) and continuous methods (Attaheri et al., 2020) of EEG analysis that are reported in the current manuscript. EEG data, and, in particular, naturalistic EEG data is inherently noisy, and averaging across participants increases the signal to noise ratio (i.e. inconsistent, and, therefore, non-task-related activity is averaged out of the response (Cohen, 2014; Noreika et al., 2020)). Examining individual EEG responses is unlikely to tell us anything meaningful, given that, if a response is not found for a particular participant, then it could be that the response is not present for that participant, or that it is present, but the EEG recording for that participant is too noisy to show the effect. Computing group-level effects, as is most common in all neuroimaging analyses, is, therefore, most optimal to examining our main research questions.

      The findings reported in this analysis also replicate previous work conducted by our lab which showed that infant attention to objects significantly forward-predicted increases in infant theta activity during joint table-top play with their caregiver, involving one toy object (compared to our paradigm which involved 3;Wass et al., 2018). More recent work conducted by our lab has also shown continuous and time-locked associations between infant look durations and infant theta activity when infants play with objects on their own (Perapoch Amadó et al., 2023). To reassure readers of the replicability of the current findings, we now reference the Wass et al. (2018) study at the beginning of the Discussion section.

      Could activity artifacts lead to certain reported trends? Babies typically look at an object before they touch or manipulate the object, and so longer bouts of attention likely involve a look and then a touch for lengthier time frames. If active involvement with an object (touching for example) amplifies theta activity, that may explain why attention duration forward predicts theta power. That is, baby looks, then touches, then theta activates, and coding would show visual gaze preceding the theta activation. Careful alignment of infants' touches and other such behaviors with the theta peak might help address this question, again to lend confidence to the robustness of the interpretation.

      Yes, again this is a very important point, and the removal of movement-related artifact is something we have given careful attention to in the analysis of our naturalistic EEG data (Georgieva et al., 2020; Marriott Haresign et al., 2021). As a result of this comment we have made direct changes to the Results section on page 18 to more clearly signal the reader to our EEG pre-processing section before presenting the results of the cross-correlation analyses.

      As we describe in the Methods section of the main text, movement-related artifacts are removed from the data with ICA decomposition, utilising an automatic-rejection algorithm, specially designed for work with our naturalistic EEG data (Marriott Haresign et al., 2021). Given that ICA rejection does not remove all artifact introduced to the EEG signal, additional analysis steps were taken to reduce the possibility that movement artifacts influenced the results of the reported analyses. As explained in the Methods section, rather than absolute theta power, relative theta was used in all EEG analyses, computed by dividing the power at each theta frequency by the summed power across all frequencies. Eye and head movement-related artifacts most often associate with broadband increases in power in the EEG signal (Cohen, 2014): computing relative theta activity therefore further reduces the potential influence of artifact on the EEG signal.

      It is also important to highlight that previous work examining movement artifacts in controlled paradigms with infants has shown that limb movements actually associate with a decrease in power at theta frequencies, compared to rest (Georgieva et al., 2020). It is therefore unlikely that limb movement artifacts explain the pattern of association observed between theta power and infant attention in the current study.

      That said, examining the association between body movements and fluctuations in EEG activity during naturalistic interactions is an important next step, and something our lab is currently working on. Given that touching an object is most often the end-state of a larger body movement, aligning the EEG signal to the onset of infant touch is not all that informative to understanding how body movements associate with increases and decreases in power in the EEG signal. Our lab is currently working on developing new methods using motion tracking software and arousal composites to understand how data-derived behavioural sub-types associate with differential patterns of EEG activity.

      The term attention may be misleading. The behavior being examined is infant gaze or looks, with the assumption that gaze is a marker of "attention". The authors are aware that gaze can be a blank stare that doesn't reflect underlying true "attention". I recommend substitution of a conservative, more precise term that captures the variable being measured (gaze); it would then be fine to state that in their interpretation, gaze taken as a marker for attention or something like that. At minimum, using term "visual attention" can be a solution if authors do not want to use the precise term gaze. As an example, the sentence "An attention episode was defined as a discrete period of attention towards one of the play objects on the table, or to the partner" should be modified to defined as looking at a play object or partner.

      We thank the reviewer for this comment, and we understand their concern with the use of the term ‘attention’ where we are referring to shifts in infant eye gaze. However, the use of this term to describe patterns of infant gaze, irrespective of whether they are ‘actually attending’ or not is used widely in the literature, in both interactive (e.g. Yu et al., 2021) and screen-based experiments examining infant attention (Richards, 2010). We therefore feel that its use in our current manuscript is acceptable and consistent with the reporting of similar interaction findings. On page 39 of the Discussion we now also include a discussion on how future research might further investigate differential subtypes of infant looks to distinguish between moments where infants are attending vs. just looking.

      Why collapse across gaze to object vs. other? Conceptually, it's unclear why the same hypotheses and research questions on neural-attention (i.e., gaze in actuality) links would apply to looks to a mom's face or to an object. Some rationale would be useful to the reader as to why these two distinct behaviors are taken as following the same principles in ordering of brain and behavior. Perhaps I missed something, however, because later in the Discussion the authors state that "fluctuations in neural markers of infants' engagement or interest forward-predict their attentiveness towards objects", which suggests there was an object-focused variable only? Please clarify. (Again, sorry if I missed something).

      This is a really important point, and we agree with R1 that it could have been more clearly expressed in our original submission – for which, we apologise. In the cross-correlation analyses conducted in parts 2 and 3 which examines forwards-predictive associations between infant attention durations and infant endogenous oscillatory activity (part two), and caregiver behaviour (part three), as R1 describes, we include all infant looks towards objects and their partner. Including all infant look types is necessary to produce a continuous variable to cross-correlate with the other continuous variables (e.g. theta activity, caregiver vocal behaviours), and, therefore, does not concentrate only on infant attention episodes towards objects.

      We take the reviewers’ point that different attention and neural mechanisms may be associated with looks towards objects vs. the partner, which we now acknowledge directly on page 10 of the Introduction. However, our focus here is on the endogenous and interactive mechanisms that drive fluctuations in infant engagement with the ongoing, free-flowing interaction. Indeed, previous work has shown increases in theta activity during sustained episodes of infant attention to a range of different stimuli, including cartoon videos (Xie et al., 2018), real-life screen-based interactions (Jones et al., 2020), as well as objects (Begus et al., 2016). In the second half of part 2, we go on to address the endogenous processes that support infant attention episodes specifically towards objects.

      As a result of this comment, we have made direct changes to the Introduction on page 10 to more clearly explain the looking behaviours included in the cross-correlation analysis, and the rationale behind the analysis being conducted in this way – which is different to the reactive analyses conducted in the second half of parts one and three, which examines infant object looks only. Direct edits to the text have also been made throughout the Results and Methods sections as a result of this comment, to more clearly specify the types of looks included in each analysis. Now, where we discuss the cross-correlation analyses we refer only to infant ‘attention durations’ or infant ‘attention’, whilst ‘object-directed attention’ and ‘looks towards objects’ is clearly specified in sections discussing the reactive analyses conducted in parts 2 and 3. We have also amended the Discussion on page 31so that the cross-correlation analyses is interpreted relative to infant overall attention, rather than their attention towards objects only.

      Why are mothers' gazes shorter than infants' gazes? This was the flip of what I'd expect, so some interpretation would be useful to understanding the data.

      This is a really interesting observation. Our findings of the looking behaviour of caregivers and infants in our joint play interactions actually correspond to much previous micro-dynamic analysis of caregiver and infant looking behaviour during early table-top interactions (Abney et al., 2017; Perapoch Amadó et al., 2023; Yu & Smith, 2013, 2016). The reason for the shorter look durations in the adult is due to the fact that the caregivers alternate their gaze between their infant and the objects (i.e. they spend a lot of the interaction time monitoring their infants’ behaviours). This can be seen in Figure 2 (see main text) which shows that caregiver looks are divided between looks to their infants and looks towards objects. In comparison, infants spend most of their time focussing on objects (see Figure 2, main text), with relatively infrequent looks to their caregiver. As a result, infant looks are, overall, longer in comparison to their caregivers’.

      Minor points

      Use the term association or relation (relationships is for interpersonal relationships, not in statistics).

      This has now been amended throughout.

      I'm unsure I'd call the interactions "naturalistic" when they occur at a table, with select toys, EEG caps on partners, and so on. The term seems more appropriate for studies with fewer constraints that occur (for example) in a home environment, etc.

      We understand R1s concern with our use of the term ‘naturalistic’ to refer to the joint play interactions that we analyse in the current study. However, we feel the term is appropriate, given that the interactions are unstructured: the only instruction given to caregivers at the beginning of the interaction is to play with their infants in the way that they might do at home. The interactions, therefore, measure free-flowing caregiver and infant behaviours, where modulations in each individual’s behaviour are the result of the intra- and inter-individual dynamics of the social exchange. This is in comparison to previous work on early infant attention development which has used more structured designs, and modulations in infant behaviour occur as a result of the parameters of the experimental task.

      Reviewer #2

      Public Review

      Summary:

      This paper acknowledges that most development occurs in social contexts, with other social partners. The authors put forth two main frameworks of how development occurs within a social interaction with a caregiver. The first is that although social interaction with mature partners is somewhat bi-directional, mature social partners exogenously influence infant behaviors and attention through "attentional scaffolding", and that in this case infant attention is reactive to caregiver behavior. The second framework posits that caregivers support and guide infant attention by contingently responding to reorientations in infant behavior, thus caregiver behaviors are reactive to infant behavior. The aim of this paper is to use moment-to-moment analysis techniques to understand the directionality of dyadic interaction. It is difficult to determine whether the authors prove their point as the results are not clearly explained as is the motivation for the chosen methods.

      Strengths

      The question driving this study is interesting and a genuine gap in the literature. Almost all development occurs in the presence of a mature social partner. While it is known that these interactions are critical for development, the directionality of how these interactions unfold in real-time is less known.

      The analyses largely seem to be appropriate for the question at hand, capturing small moment-to-moment dynamics in both infant and child behavior, and their relationships with themselves and each other. Autocorrelations and cross-correlations are powerful tools that can uncover small but meaningful patterns in data that may not be uncovered with other more discretized analyses (i.e. regression).

      We are pleased that R2 finds our work to be an interesting contribution to the field, which utilises appropriate analysis techniques.

      Weaknesses

      The major weakness of this paper is that the reader is assumed to understand why these results lead to their claimed findings. The authors need to describe more carefully their reasoning and justification for their analyses and what they hope to show. While a handful of experts would understand why autocorrelations and cross-correlations should be used, they are by no means basic analyses. It would also be helpful to use simulated data or even a simple figure to help the reader more easily understand what a significant result looks like versus an insignificant result.

      We thank the reviewer for this comment, and we agree that much more detail should be added to the Introduction section. As a result of this comment, we have made direct changes to the Introduction on pages 9-11 to more clearly detail these analysis methods, our rationale for using these methods; and how we expect the results to further our understanding of the drivers of infant attention in naturalistic social interactions.

      We also provide a figure in the SM (Fig. S6) to help the reader more clearly understand the permutation method used in our statistical analyses described in the Methods, on page 51, which depicts significant vs. insignificant patterns of results against their permutation distribution.

      While the overall question is interesting the introduction does not properly set up the rest of the paper. The authors spend a lot of time talking about oscillatory patterns in general but leave very little discussion to the fact they are using EEG to measure these patterns. The justification for using EEG is also not very well developed. Why did the authors single out fronto-temporal channels instead of using whole brain techniques, which are more standard in the field? This is idiosyncratic and not common.

      We very much agree with R2 that the rationale and justification for using EEG to understand the processes that influence infants’ attention patterns is under-developed in the current manuscript. As a result of this comment we have made direct edits to the Introduction section of the main text on pages 7-8 to more clearly describe the rationale for examining the relationship between infant EEG activity and their attention during the play interactions with their caregivers.

      As we describe in the Introduction section, previous behavioural work conducted with infants has suggested that endogenous cognitive processes (i.e. fluctuations in top-down cognitive control) might be important in explaining how infants allocate their attention during free-flowing, naturalistic interactions towards the end of the first year. Oscillatory neural activity occurring at theta frequencies (3-6Hz), which can be measured with EEG, has previously been associated with top-down intrinsically guided attentional processes in both adulthood and infancy (Jones et al., 2020; Orekhova, 1999; Xie et al., 2018). Measuring fluctuations in infant theta activity therefore provides a method to examine how endogenous cognitive processes structure infant attention in naturalistic social interactions which might be otherwise unobservable behaviourally.

      It is important to note that the Introduction distinguishes between two different oscillatory mechanisms that could possibly explain the organisation of infant attention over the course of the interaction. The first refers to oscillatory patterns of attention, that is, consistent attention durations produced by infants that likely reflect automatic, regulatory functions, related to fluctuations in infant arousal. The second mechanism is oscillatory neural activity occurring at theta frequencies, recorded with EEG, which, as mentioned above, is thought to reflect fluctuations in intrinsically guided attention in early infancy. We have amended the Introduction to make the distinction between the two more clear.

      A worrisome weakness is that the figures are not consistently formatted. The y-axes are not consistent within figures making the data difficult to compare and interpret. Labels are also not consistent and very often the text size is way too small making reading the axes difficult. This is a noticeable lack of attention to detail.

      This has now been adjusted throughout, where appropriate.

      No data is provided to reproduce the figures. This does not need to include the original videos but rather the processed and de-identified data used to generate the figures. Providing the data to support reproducibility is increasingly common in the field of developmental science and the authors are greatly encouraged to do so.

      This will be provided with the final manuscript.

      Minor Weaknesses

      Figure 4, how is the pattern in a not significant while in b a very similar pattern with the same magnitude of change is? This seems like a spurious result.

      The statistical analysis conducted for all cross-correlation analyses reported follows a rigorous and stringent permutation-based temporal clustering method which controls for family-wise error rate using a non-parametric Monte Carlo method (see Methods in the main text for more detail). Permutations are created by shuffling data sets between participants and, therefore, patterns of significance identified by the cluster-based permutation analysis will depend on the mean and standard deviation of the cross-correlations in the permutation distribution. Fig. S6 now depicts the cross-correlations against their permutation distributions which should help readers to understand the patterns of significance reported in the main text.

      The correlations appear very weak in Figures 3b, 5a, 7e. Despite a linear mixed effects model showing a relationship, it is difficult to believe looking at the data. Both the Spearman and Pearson correlations for these plots should be clearly included in the text, figure, or figure legend.

      We thank the reviewer for this comment, and agree that reporting the correlations for these plots would strengthen the findings of the linear mixed effects models reported in text. As a result, we have added both Spearman and Pearson correlations to the legends of Figures 3b, 5a and 7e, corresponding to the statistically significant relationships examined in the linear mixed effects models. The strength of the relationships are entirely consistent with those documented in other previous research that used similar methods (e.g. Piazza et al., 2018). How strong the relationship looks to the observer is entirely dependent on the graphical representation chosen to represent it. We have chosen to present the data in this way because we feel that it is the most honest way to represent the statistically significant, and very carefully analysed, effects that we have observed in our data.

      Linear mixed effects models need more detail. Why were they built the way they were built? I would have appreciated seeing multiple models in the supplementary methods and a reasoning to have landed on one. There are multiple ways I can see this model being built (especially with the addition of a random intercept). Also, there are methods to test significance between models and aid in selection. That being said, although participant identity is a very common random effect, its use should be clearly stated in the main text.

      We very much agree with R2 that the reporting of the linear mixed effects models needs more detail and this has now been added to the Method section (page 54). Whilst it is true that there are multiple ways in which this model could be built, given the specificity of our research questions, regarding the reactive changes in infant theta activity and caregiver behaviours that occur after infant look onsets towards objects (see pages 9-11 of the Introduction), we take a hypothesis driven approach to building the linear mixed effects models. As a result, random intercepts are specified for participants, as well as uncorrelated by-participant random slopes (Brown, 2021; Gelman & Hill, 2006; Suarez-Rivera et al., 2019). In this way, infant look durations are predicted from caregiver behaviours (or infant theta activity), controlling for between participant variability in look durations, as well as the strength of the effect of caregiver behaviours (or infant theta activity) on infant look durations.

      Some parentheses aren't closed, a more careful re-reading focusing on these minor textual issues is warranted.

      This has now been corrected.

      Analysis of F0 seems unnecessarily complex. Is there a reason for this?

      Computation of the continuous caregiver F0 variable may seem complex but we feel that all analysis steps are necessary to accurately and reliably compute this variable in our naturalistic, noisy and free-flowing interaction data. For example, we place the F0 only into segments of the interaction identified as the mum speaking so that background noises and infant vocalisations are not included in the continuous variable. We then interpolate through unvoiced segments (similar to Räsänen et al., 2018), and compute the derivative in 1000ms intervals as a measure of the rate of change. The steps taken to compute this variable have been both carefully and thoughtfully selected given the many ways in which this continuous rate of change variable could be computed (cf. Piazza et al., 2018; Räsänen et al., 2018).

      The choice of a 20hz filter seems odd when an example of toy clacks is given. Toy clacks are much higher than 20hz, and a 20hz filter probably wouldn't do anything against toy clacks given that the authors already set floor and ceiling parameters of 75-600Hz in their F0 extraction.

      We thank the reviewer for this comment and we can see that this part of the description of the F0 computation is confusing. A 20Hz low pass filter is applied to the data stream after extracting the F0 with floor and ceiling parameters set between 75-600Hz. The 20Hz filter therefore filters modulations in the caregivers’ F0 that occur at a modulation frequency greater than 20Hz. The 20Hz filter does not, therefore, refer to the spectral filtering of the speech signal. The description of this variable has been rephrased on page 48 of the main text.

      Linear interpolation is a choice I would not have made. Where there is no data, there is no data. It feels inappropriate to assume that the data in between is simply a linear interpolation of surrounding points.

      The choice to interpolate where there was no data was something we considered in a lot of detail, given the many options for dealing with missing data points in this analysis, and the difficulties involved with extracting a continuous F0 variable in our naturalistic data sets. As R2 points out, one option would be to set data points to NaN values where no F0 is detected and/ or the Mum is not vocalising. A second option, however, would be to set the continuous variable to 0s where no F0 is detected and/ or the Mum is not vocalising (where the mum is not producing sound there is no F0 so rather than setting the variable to missing data points, really it makes most objective sense to set to 0).

      Either of these options (setting parts where no F0 is detected to NaN or 0) makes it difficult to then meaningfully compute the rate of change in F0: where NaN values are inserted, this reduces the number of data points in each time window; where 0s are inserted this creates large and unreal changes in F0. Inserting NaN values into the continuous variable also reduces the number of data points included in the cross-correlation and event-locked analyses. It is important to note that, in our naturalistic interactions, caregivers’ vocal patterns are characterised by lots of short vocalisations interspersed by short pauses (Phillips et al., in prep), similar to previous findings in naturalistic settings (Gratier et al., 2015). Interpolation will, therefore, have largely interpolated through the small pauses in the caregiver’s vocalisations.

      The only limitation listed was related to the demographics of the sample, namely saying that middle class moms in east London. Given that the demographics of London, even east London are quite varied, it's disappointing their sample does not reflect the community they are in.

      Yes we very much agree with R2 that the lack of inclusion of caregivers from wider demographic backgrounds is disappointing, and something which is often a problem in developmental research. Our lab is currently working to collect similar data from infants with a family history of ADHD, as part of a longitudinal, ongoing project, involving families from across the UK, from much more varied demographic backgrounds. We hope that the findings reported here will feed directly into the work conducted as part of this new project.

      That said, demographic table of the subjects included in this study should be added.

      This is now included in the SM, and referenced in the main text.

      References

      Abney, D. H., Warlaumont, A. S., Oller, D. K., Wallot, S., & Kello, C. T. (2017). Multiple Coordination Patterns in Infant and Adult Vocalizations. Infancy, 22(4), 514–539. https://doi.org/10.1111/infa.12165

      Attaheri, A., Choisdealbha, Á. N., Di Liberto, G. M., Rocha, S., Brusini, P., Mead, N., Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., Grey, C., Flanagan, S., & Goswami, U. (2020). Delta- and theta-band cortical tracking and phase-amplitude coupling to sung speech by infants [Preprint]. Neuroscience. https://doi.org/10.1101/2020.10.12.329326

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants’ preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397–12402. https://doi.org/10.1073/pnas.1603261113

      Brown, V. A. (2021). An Introduction to Linear Mixed-Effects Modeling in R.

      Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. The MIT Press.

      Gelman, A., & Hill, J. (2006). In Data Analysis using Regression and mulilevel/Hierachical Models. Cambridge University Press.

      Georgieva, S., Lester, S., Noreika, V., Yilmaz, M. N., Wass, S., & Leong, V. (2020). Toward the Understanding of Topographical and Spectral Signatures of Infant Movement Artifacts in Naturalistic EEG. Frontiers in Neuroscience, 14, 352. https://doi.org/10.3389/fnins.2020.00352

      Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., & Parlato-Oliveira, E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01167

      Jones, E. J. H., Goodwin, A., Orekhova, E., Charman, T., Dawson, G., Webb, S. J., & Johnson, M. H. (2020). Infant EEG theta modulation predicts childhood intelligence. Scientific Reports, 10(1), 11232. https://doi.org/10.1038/s41598-020-67687-y

      Marriott Haresign, I., Phillips, E., Whitehorn, M., Noreika, V., Jones, E. J. H., Leong, V., & Wass, S. V. (2021). Automatic classification of ICA components from infant EEG using MARA. Developmental Cognitive Neuroscience, 52, 101024. https://doi.org/10.1016/j.dcn.2021.101024

      Noreika, V., Georgieva, S., Wass, S., & Leong, V. (2020). 14 challenges and their solutions for conducting social neuroscience and longitudinal EEG research with infants. Infant Behavior and Development, 58, 101393. https://doi.org/10.1016/j.infbeh.2019.101393

      Orekhova, E. (1999). Theta synchronization during sustained anticipatory attention in infants over the second half of the first year of life. International Journal of Psychophysiology, 32(2), 151–172. https://doi.org/10.1016/S0167-8760(99)00011-2

      Perapoch Amadó, M., Greenwood, E., James, Labendzki, P., Haresign, I. M., Northrop, T., Phillips, E., Viswanathan, N., Whitehorn, M., Jones, E. J. H., & Wass, S. (2023). Naturalistic attention transitions from subcortical to cortical control during infancy. [Preprint]. Open Science Framework. https://doi.org/10.31219/osf.io/6z27a

      Piazza, E. A., Hasenfratz, L., Hasson, U., & Lew-Williams, C. (2018). Infant and adult brains are coupled to the dynamics of natural communication [Preprint]. Neuroscience. https://doi.org/10.1101/359810

      Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206. https://doi.org/10.1016/j.cognition.2018.05.015

      Richards, J. E. (2010). The development of attention to simple and complex visual stimuli in infants: Behavioral and psychophysiological measures. Developmental Review, 30(2), 203–219. https://doi.org/10.1016/j.dr.2010.03.005

      Suarez-Rivera, C., Smith, L. B., & Yu, C. (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental Psychology, 55(1), 96–109. https://doi.org/10.1037/dev0000628

      Wass, S. V., Noreika, V., Georgieva, S., Clackson, K., Brightman, L., Nutbrown, R., Covarrubias, L. S., & Leong, V. (2018). Parental neural responsivity to infants’ visual attention: How mature brains influence immature brains during social interaction. PLOS Biology, 16(12), e2006328. https://doi.org/10.1371/journal.pbio.2006328

      Xie, W., Mallin, B. M., & Richards, J. E. (2018). Development of infant sustained attention and its relation to EEG oscillations: An EEG and cortical source analysis study. Developmental Science, 21(3), e12562. https://doi.org/10.1111/desc.12562

      Yu, C., & Smith, L. B. (2013). Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination. PLoS ONE, 8(11), e79659. https://doi.org/10.1371/journal.pone.0079659

      Yu, C., & Smith, L. B. (2016). The Social Origins of Sustained Attention in One-Year-Old Human Infants. Current Biology, 26(9), 1235–1240. https://doi.org/10.1016/j.cub.2016.03.026

      Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences, 118(52), e2107019118. https://doi.org/10.1073/pnas.2107019118

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their appreciation of our study and thoughtful comments. In response to the main concern raised by all reviewers regarding the potential influences of external noise factors on intuitive inference, such as external disturbances or imperfect observations, we have conducted three new experiments suggested by the reviewers. These experiments were designed to: (1) assess the influence of external forces on humans’ judgments by implementing a wall to block wind disturbances from one direction, (2) examine human accuracy in predicting the landing position of a falling ball when its trajectory is obscured, and (3) evaluate the effect of object geometry on human judgment of stability. The findings from these experiments consistently support our proposal of the stochastic world model on gravity embedded in human mind. Besides, we have also addressed the rest comments from the reviewers in a one-by-one fashion.

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review, I did not find it entirely convincing that the study shows evidence for a Gaussian understanding of gravity. There are two studies that would bolster this claim: 1. Replicate experiment 1, but also ask people to infer whether there was a hidden force. If people are truly representing gravity as proposed in the paper, you should get no force inferences. However, if the reason the Gaussian gravity model works is that people infer unseen forces, this should come out clearly in this study.

      Author response image 1.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R1: We thank the reviewer for this suggestion. To directly test whether participants’ judgments were influenced by their implicit assumptions about external forces, we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). Before the start of the experiment, we explicitly informed the participants that the wall was designed to block wind, ensuring that any potential wind forces from the direction of the wall would not influence the collapse. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants tested (1 female; ages: 24-30), similar to the experiment without the wall (Supplementary Figure 4B). Therefore, the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, not shaped by external forces or explicit instructions.

      This new experiment has been added to the revised manuscript

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (2) Similarly, you can imagine a simple study where you drop an object behind a floating occluder and you check where people produce an anticipatory fixation (i.e., where do they think the object will come out?). If people have a stochastic representation of gravity, this should be reflected in their fixations. But my guess is that everyone will look straight down.

      Author response image 2.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      R2: We thank the reviewer for suggesting this thought experiment. However, when predicting the landing point of a falling object, participants may rely more on learned knowledge that an unimpeded object continues to fall in a straight line, rather than drawing on their intuitive physics. To avoid this potential confounding factor, we designed a similar experiment where participants were asked to predict the landing point of a parabolic trajectory, obscured by an occluder (Author response image 2A). In each trial, participants used a mouse (clicking the left button) to predict the landing point of each parabolic trajectory, and there were 100 trials in total. This design not only limits the impact of direct visual cues but also actively engages the mental simulation of intuitive physics. All three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      (3) I believe the correct alternative model should be the one that has uncertainty over unseen forces, which better captures current proposals in the field, and controls for the amount of uncertainty in the models.

      R3: We thank the reviewers for the above-mentioned suggestions, and the findings from these two new experiments reinforce our proposal regarding the inherent stochastic characteristic of how the mind represents gravity.

      (4) I was not convinced that the RL framework was set up correctly to tackle the questions it claims to tackle. What this shows is that you can evolve a world model with Gaussian gravity in a setup that has no external perturbations. That does not imply that that is how humans evolved their intuitive physics, particularly when creatures have evolved in a world full of external perturbations. Showing that when (1) there are hidden perturbations, and (2) these perturbations are learnable, but (3) the model nonetheless just learns stochastic gravity, would be a more convincing result.

      R4: We completely agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity. In fact, introducing additional external noise into the RL framework likely heightens the uncertainty in learning gravity’s direction, potentially amplifying, rather than diminishing, the stochastic nature of mental gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) Some comments on the writing:

      The word 'normality' is used to refer to people's judgments about whether a tower collapsed looked 'normal'. I was a bit confused by this because normality can also mean 'Gaussian' and the experiments are also sampling from Gaussian distributions. There were several points where it took me a second to figure out which sense of 'normality' the paper was using. I would recommend using a different term.

      R5: We are sorry for the confusion. In revision, the term “normality” has been replaced with “confidence level about normal trajectory”.

      (6) One small comment is that Newton's laws are not a faithful replica of the "physical laws of the world" they are a useful simplification that only works at certain timescales. I believe some people propose Newtonian physics as a model of intuitive physics in part because it is a rapid and useful approximation of complex physical systems, and not because it is an untested assumption of perfect correspondence.

      R6: We are sorry for the inaccurate expression. We have revised our statements in the manuscript Line 15-16: “We found that the world model on gravity was not a faithful replica of the physical laws, but instead encoded gravity’s vertical direction as a Gaussian distribution.”

      (7) Line 49-50: Based on Fig 1d, lower bound of possible configurations for 10 blocks is ~17 in log-space, which is about 2.5e7. But the line here says it's 3.72e19, which is much larger. Sorry if I am missing something.

      R7: We thank the reviewer to point out this error. We re-calculated the number of possible configurations using the formula (3) in the appendix, and the number of configurations with 10 blocks is:

      Thus,

      This estimated number is much larger than that in our previous calculation, which has been corrected in the revised text.

      Line 827-829: “d) The lower bound of configurations’ possible number and the number of blocks in a stack followed an exponential relationship with a base of 10. The procedure can create at least 1.14×1050 configurations for stacks consisting of 10 blocks.”

      Line 49-50: “… but the universal cardinality of possible configurations is at least 1.14×1050 (Supplementary Figure 1), …”

      Line 1017-1018: “… the number of configurations can be estimated with formula (9), which is 1.14×1050.”

      (8) Lines 77-78: "A widely adopted but not rigorously tested assumption is that the world model in the brain is a faithful replica of the physical laws of the world." This risks sounding like you are asserting that colleagues in the field do not rigorously test their models. I think you meant to say that they did not 'directly test', rather than 'rigorously test'. If you meant rigorous, you might want to say more to justify why you think past work was not rigorous.

      R8: We apologize for the inappropriate wording, the sentence has been revised and we illustrate the motivation more comprehensively in the revised text,

      Line 76-92: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach.”

      (9) Lines 79-84 States that past models encode gravity downward. It then says that alternatively there is consensus that the brain uses data from sensory organs and adds meaning to them. I think there might be a grammatical error here because I did not follow why saying there is 'consensus' on something is a theoretical alternative. I also had trouble following why those two statements are in opposition. Is any work on physics engines claiming the brain does not take data from sensory organs and add meaning to them?

      R9: We are sorry for the confusion. Here we intend to contrast the deterministic model (i.e., the uncertainty comes from outside the model) with the stochastic model (i.e., the uncertainty is inherently built into the model). In revision, we have clarified the intention. For details, please see R8.

      (10) Lines 85-88: Following on the sentence above, you then conclude that the representation of the world may therefore not be the same as reality. I did not understand why this followed. It seems you are saying that, because the brain takes data from sensory organs, therefore its representations may differ from reality.

      R10: Again, we are sorry about the confusion. Please see the revised text in R8.

      (11) Lines 190-191: I had trouble understanding this sentence. I believe you are missing an adjective to clarify that participants were more inclined to judge taller stacks as more likely to collapse.

      R11: We are sorry for the confusion. What we intended to state here is that participants’ judgment was biased, showing a tendency to predict a collapse for stacks regardless of their actual stability. We have revised this confusing sentence in the revision. Line 202–204: “However, the participants showed an obvious bias towards predicting a collapse for stacks regardless of their actual stability, as the dots in Fig 2b are more concentrated on the lower side of the diagonal line.”

      (12) Line 201: I don't think it's accurate to say that MGS "perfectly captured participants' judgments" unless the results are actually perfect.

      R12: We agree, and in revision we have toned down the statement Line 213–214: “…, the MGS, in contrast to the NGS, more precisely reflected participants’ judgments of stability …”

      Reviewer #2 (Recommendations For The Authors):

      I think this is an impressive set of experiments and modeling work. The paper is nicely written and I appreciate the poetic license the authors took at places in the manuscript. I only have clarification points and suggest a simple experiment that could lend further support to their conclusions. 1. In my opinion, the impact of this work is twofold. First, the suggestion that gravity is represented as a distribution of the world and not a result of (inferred) external perturbations. Second, that the distribution is advantageous as it balances speed and accuracy, and lessens computational processing demands (i.e., number of simulations). The second point here is contingent on the first point, which is really only supported by the RL model and potentially the inverted scene condition. I am somewhat surprised that the RL model does not converge on a width much smaller than ~20 degrees after 100,000 simulations. From my understanding, it was provided feedback with collapses based on natural gravity (deterministically downward). Why is learning so slow and the width so large? Could it be the density of the simulated world model distribution? If the model distribution of Qs was too dense, then Q-learning would take forever. If the model distribution was too sparse, then its final estimate would hit a floor of precision. Could the authors provide more details on the distribution of the Qs for the RL model?

      Author response image 3.

      RL learning curves as a function of θ angle with different sampling densities and learning rates. Learning rates were adjusted to low (a), intermediate (b) and high (c) settings, while sampling densities were chosen at four levels: 5x5, 11x11, 31x31, and 61x61 shown from the left to the right. Two key observations emerged from the simulations as the reviewer predicted. First, higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances. Second, increased sampling density necessitated more iterations for convergence. Note that in all simulations, we limited the iterations to 1,000 times (as opposed to 100,000 times reported in the manuscript) to demonstrate the trend without excessive computational demands.

      R1: To illustrate the distribution of the Q-values for the RL model, we re-ran the RL model with various learning rates and sampling densities (Author response image 3). These results support the reviewer’s prediction that higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances, and increased sampling density requires more iterations for convergence.

      This simulation also elucidates the slower learning observed in the experiment described in the text, where the force sphere was divided into 61x61 angle pairs, and the learning rate was set to 0.15. This set of parameters ensured convergence within a reasonable brief timeframe while maintaining high-resolution force assessments.

      Besides, the width of the Gaussian distribution is mainly determined by the complexity of stacks. As shown in Figure 3c and Supplementary Figure 9, stacks with fewer blocks (i.e., less complex) caused a larger width, whereas those with more blocks resulted in a narrower spread. In the study, we used a collection of stacks varying from 2 to 15 blocks to simulate the range of stacks humans typically encounter in daily life.

      In revision, we have incorporated these insights suggested by the reviewer to clarify the performance of the RL framework:

      Line 634-639: “The angle density and learning rate are two factors that affect the learning speed. A larger angle density prolongs the time to reach convergence but enables a more detailed force space; a higher learning rate accelerates convergence but incurs larger variance during training. To balance speed and convergence, we utilized 100,000 configurations for the training.”

      Line 618-619: “…, separately divided them into 61 sampling angles across the spherical force space (i.e., the angle density).”

      (2) Along similar lines, the authors discuss the results of the inverted science condition as reflecting cognitive impenetrability. However, do they also interpret it as support for an intrinsically noisy distribution of gravity? I would be more convinced if they created a different scene that could have the possibility of affecting the direction of an (inferred) external perturbation - a previously held explanation of the noisy world model. For example, a relatively simple experiment would be to have a wall on one side of the scene such that an external perturbation would be unlikely to be inferred from that direction. In the external perturbation account, phi would then be affected resulting in a skewed distribution of angle pairs. However, in the authors' stochastic world model phi would remain unaffected resulting in the same uniform distribution of phi the authors observed. In my opinion, this would provide more compelling evidence for the stochastic world model.

      Author response image 4.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R2: We thank the reviewer for this suggestion. Following the reviewer’s concern, we designed the experiment with the addition of a wall implemented on one side (Supplementary figure 4A). We explicitly informed the participants that the wall was designed to block wind before the start of the experiment, ensuring no potential wind forces from the direction of the wall to influence the collapse trajectory of configurations. Participants need to judge if the trajectory was normal. If participants’ judgments were influenced by external noises, we would expect to observe a skewed angle distribution. However, our results still showed a normal distribution across all participants tested, consistent with the experiment without the wall (Supplementary figure 4B). This experiment suggested the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, rather than shaped by external forces or explicit instructions.

      We revised the original manuscript, and added this new experiment

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (3) I didn't completely follow the authors' explanation for the taller objects illusion. On lines 229-232, the authors state that deviations from gravity's veridical direction are likely to accumulate with the height of the objects. Is this because, in the stochastic world model account, each block gets its own gravity vector that is sampled from the distribution? The authors should clarify this more explicitly. If this is indeed the author's claim, then it would seem that it could be manipulated by varying the dimensions of the blocks (or whatever constitutes an object).

      R3: We are sorry for the confusion caused by the use of the term ‘accumulate’. In the study, there is only one gravity vector sampled from the distribution for the entire structure, rather than each block having a unique gravity vector. The height illusion is attributed to the fact that the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction. This is especially true for objects consisting of multiple blocks stacked atop one another. In revision, we have removed the confusing term ‘accumulate’ for clarification.

      Line 242-244: “…, because the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction during humans’ internal simulations.”

      (4) The authors refer to the RL simulations as agent-environment interactions, but in reality, the RL model does not interact with the blocks. Would experience-dependent or observation be more apropos?

      R4: We completely agree. Indeed, the RL model did not manipulate stacks; rather, it updated its knowledge of natural gravity based on the discrepancies between the RL model’s predictions and observed outcomes. In revision, we have removed the confusing term ‘agent-environment interactions’ and clarified its intended meaning.

      Line 19-22: “Furthermore, a computational model with reinforcement learning revealed that the stochastic characteristic likely originated from experience-dependent comparisons between predictions formed by internal simulations and the realities observed in the external world, …”

      Reviewer #3 (Public Review):

      (1) In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.

      Author response image 5.

      Differentiating Subjectivity from Objectivity. In both Experiment 1 (a) and Experiment 2 (b), participants were instructed to determine which shape appeared most stable. Objectively, in the absence of external forces, all shapes possess equal stability. Yet, participants typically perceived the shape on the left as the most stable because of its larger base area. The discrepancy between objective realities and subjective feelings, as we propose, is attributed to the human mind representing gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      R1: We agree with the reviewer that objects will remain stable until disturbed by external forces. However, in many cases, this is a clear discrepancy between objective realities and subjective feelings. For example, electromagnetic waves associated with purple and red colors are the farthest in the electromagnetic space, yet purple and red are the closest colors in the color space. Similarly, as shown in Supplementary Figure 4, in reality all shapes possess equal stability in the absence of external forces. Yet, humans typically perceive the shape on the left as more stable because of its larger base area. In this study, we tried to explore the mechanism underlying this discrepancy by proposing that the human mind represents gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      In revision, we have clarified the rationale of this study

      Line 76-98: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach. Here, we investigated these two alternative hypotheses regarding the construction of the world model in the brain by examining how gravity’s direction is represented in the world model when participants judged object stability.”

      (2) The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint.

      R2: We thank the reviewer for highlighting several potential confounding factors in our study. We address each of these concerns point-by-point:

      (a) Misinterpretation of the 3D scene and motion. In Response Figure 4 shown above, there is no 3D structure, yet participants’ judgment on stability still deviated from objective realities. In addition, the introduction of 3D motion was to aid in understanding the stacks’ 3D structure. Previous studies without 3D motion have reported similar findings (Allen et al., 2020). Therefore, regardless of whether objects are presented in 2D or 3D, or in static or in motion formats, humans’ judgment on object stability appears consistent.

      (b) Errors in perceived height. While there might be discrepancies between perceived and simulated heights, such errors are systematic across all conditions. Therefore, they may affect the width of the Gaussian distribution but do not fundamentally alter its existence.

      (c) The viewpoint. In one experiment, we inverted gravity’s direction to point upward, diverging from common daily experience. Despite this change in viewpoint, the Gaussian distribution was still observed. That is, the viewpoint appears not a key factor in influencing how gravity’s direction is represented as a Gaussian distribution in our mental world.

      In summary, both our and previous studies (Allen et al., 2020; Battaglia et al., 2013) agree that humans’ subjective assessments of objects’ stability deviate from actual stability due to noise in mental simulation. Apart from previous studies, we suggest that this noise is intrinsic, rather than stemming from external forces or imperfect observations.

      (3) Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.

      R3: Indeed, the external forces suggested by the reviewer certainly influence judgments of objects’ stability. The critical question, however, is whether humans’ judgments on objects’ stability accurately mirror the actual stability of objects in the absence of external forces. To address this question, we designed two new experiments.

      Experiment 1: we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). We explicitly informed the participants that the wall could block wind, ensuring that no potential wind from the direction of the wall could influence the configuration. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants (Age: 25-30, two females), which is similar to the experiment without the wall (Supplementary Figure 4B).

      Author response image 6.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      Experiment 2: The second experiment adopted another paradigm to test the hypothesis of stochastic mental simulation. Consider humans to infer the landing point of a parabolic trajectory that was obscured by an occlude (Author response image 2A), the stochastic mental simulation predicted that humans’ behavior follows a Gaussian distribution. However, if humans’ judgments were influenced by external noise, the landing points could not be Gaussian. The experiment consists of 100 trials in total, and in each trial participants used a mouse to predict the landing point of each trajectory by clicking the left button. Our results found all three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      Author response image 7.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      (4) The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event?

      R4: We agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.

      Author response image 8.

      Perception Uncertainty in 3D stacks structures. (a) Experimental design. A pair of two stacks with similar placements of blocks were presented sequentially to participants, who were instructed to judge whether the stacks were identical and to rate their confidence in this judgment. Each stack was presented on the screen for 2 seconds. (b) Behavior Performance. Three participants (2 males, age range: 24-30) were recruited to the experiment. The confidence in determining whether a pair of stacks remained unchanged rapidly decreased when each block had a very small displacement, suggesting humans could keenly perceive trivial changes in configurations. The x-axis denotes the difference in block placement between stacks, with the maximum value (0.4) corresponding to the length of a block’s short side. The Y-axis denotes humans’ confidence in reporting no change. The red curve illustrates the average confidence level across 4 runs, while the yellow curve is the confidence level of each run.

      R5: Indeed, uncertainty is inevitable when perceiving the external world, because our perception is not a faithful replica of external reality. A more critical question pertains to the accuracy of our perception in representing the 3D coordinates of a stack’s blocks. To address this question, we designed a straightforward experiment (Author response image 5a), where participants were instructed to determine whether a pair of stacks were identical. The position of each block was randomly changed horizontally. We found that all participants were able to accurately identify even minor positional variations in the 3D structure of the stacks (Author response image 5b). This level of perceptual precision is adequate for locating the difference between predictions from mental simulations and actual observations of the external world.

      (6)Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?

      R6: Yes. The NGS model achieved 80% accuracy as rapidly as the MGS model. However, the NGS model required a significantly longer period to reach the plateau crucial for decision-making. In revision, this information is now included.

      Line 348-350: “…, while the initial growth rates of both models were comparable, the MGS reached the plateau crucial for decision-making sooner than the NGS.”

      We greatly appreciate the thorough and insightful review provided by all three reviewers, which has considerably improved our manuscript, especially in terms of clarity in the presentation of the approach and further validation of the robustness implications of our results.

      Reference: Allen KR, Smith KA, Tenenbaum JB. 2020. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences 117:29302–29310.

      Battaglia PW, Hamrick JB, Tenenbaum JB. 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110:18327–18332.

      Friston K, Moran RJ, Nagai Y, Taniguchi T, Gomi H, Tenenbaum J. 2021. World model learning and inference. Neural Networks 144:573–590.

      Kriegeskorte N, Douglas PK. 2019. Interpreting encoding and decoding models. Current opinion in neurobiology 55:167–179.

      MacKay DM. 1956. The epistemological problem for automataAutomata Studies.(AM-34), Volume 34. Princeton University Press. pp. 235–252.

      Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M, Uchibe E, Morimoto J. 2022. Deep learning, reinforcement learning, and world models. Neural Networks.

      Naselaris T, Kay KN, Nishimoto S, Gallant JL. 2011. Encoding and decoding in fMRI. Neuroimage 56:400–410.

      Zhou L, Smith K, Tenenbaum J, Gerstenberg T. 2022. Mental Jenga: A counterfactual simulation model of physical support.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study combines a comparative approach in different synapses with experiments that show how synaptic vesicle endocytosis in nerve terminals regulates short-term plasticity. The data presented support the conclusions and make a convincing case for fast endocytosis as necessary for rapid vesicle recruitment to active zones. Some aspects of the description of the data and analysis are however incomplete and would benefit from a more rigorous approach. With more discussion of methods and analysis, this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After the acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses an acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear-cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated. Although this is a hard question and difficult to address experimentally, reagents may affect synaptic vesicle mobilization to the release sites directly in addition to blocking endocytosis.

      To acutely block vesicle endocytosis, we utilized two different pharmacological tools, Dynasore and Pitstop-2, after testing their blocking spectra and potencies at the calyx presynaptic terminals and collected data of their common effects on target functions. Since the recovery from STD was faster at the calyx synapses in the presence of both endocytic blockers in physiological 1.3 mM [Ca2+] (Figure 2B), but not in 2.0 mM [Ca2+] (Figure S4), they might facilitate vesicle mobilization in physiological condition.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular, the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse. This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      The concept of FRP and SRP are derived from voltage-clamp step-depolarization experiments at calyces of Held in pre-hearing rodents at RT, which cannot be directly dissected in data of action-potential evoked EPSCs at post-hearing calyces at physiological conditions. However, we dissected as much by referring to related literatures in new paragraphs in Result section (p9-10), particularly on the different effects of Latrunculin application and experimental conditions by adding a new supplementary Figure (now S5). Regarding F-actin role in vesicle replenishment at cerebellar synapses, we added sentences in Discussion section (p14, last paragraph).

      Reviewer #3 (Public Review):

      General comments:

      (1) While Dynasore and Pitstop-2 may impede release site clearance due to an arrest of membrane retrieval, neither Latrunculin-B nor ML-141 specifically acts on AZ scaffold proteins. Interference with actin polymerization may have a number of consequences many of which may be unrelated to release site clearance. Therefore, neither Latrunculin-B nor ML-141 can be considered suitable tools for specifically identifying the role of AZ scaffold proteins (i.e. ELKS family proteins, Piccolo, Bassoon, α-liprin, Unc13, RIM, RBP, etc) in release site clearance which was defined as one of the principal aims of this study.

      In this study, we focused our analysis on the downstream activity of scaffold protein intersectin by comparing the common inhibitory effects of CDC42 and actin polymerization, by use of ML141 and Latrunculin B, respectively, on vesicle endocytosis and synaptic depression/ facilitation without addressing diverse individual drug effects. To avoid confusion we removed “AZ” from scaffold protein.

      (2) Initial EPSC amplitudes more than doubled in the presence of Dynasor at hippocampal SC->CA1 synapses (Figure S2). This unexpected result raises doubts about the specificity of Dynasor as a tool to selectively block SV endocytosis.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) In this study, the application of Dynasore and Pitstop-2 strongly decreases 100 Hz steady-state release at calyx synapses while - quite unexpectedly - strongly accelerates recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      The latrunculin effect on STD can vary according to the condition of application and external [Ca2+], which we show in a new supplemental Figure S5. The latrunculin effect on the recovery from STD also varies with temperature, [Ca2+], and animal age, which affect Ca2+-dependent fast recovery component from depression. We added paragraphs for this issue in Results section (p9-10).

      (4) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We added methodological explanations and reworded sentences in the text to be clear for pharmacological data derived from non-sequential separate experiments.

      (5) The authors compare results obtained in calyx with those obtained in SC->CA1 synapses which they considered examples for 'fast' and 'slow' synapses, respectively. There is little information given to help readers understand why these two synapse types were chosen, what the attributes 'fast' and 'slow' refer to, and how that may matter for the questions studied here. I assume the authors refer to the maximum frequency these two synapse types are able to transmit rather than to EPSC kinetics?

      Yes, the “fast and slow” naming features maximum operating frequency these synapses can transmit. We reworded “fast and slow” to “fast-signaling and slow-plastic” and added explanation in the text.

      (6) Strong presynaptic stimuli such as those illustrated in Figures 1B and C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents a fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Since the data shown in Figs. 1 and 3 are central to the argumentation, illustration of the corresponding conductance traces is mandatory. Merely mentioning that the first 450 ms after stimulation were skipped during analysis is insufficient.

      Conductance trace is shown with a trace of capacitance change induced by a square pulse in our previous paper (Yamashita et al, 2005 Science).

      (7) It is essential for this study to preclude a contamination of the results with postsynaptic effects (AMPAR saturation and desensitization). AMPAR saturation limits the amplitudes of initial responses in EPSC trains and hastens the recovery from depression due to a 'ceiling effect'. AMPAR desensitization occludes paired-pulse facilitation and reduces steady-state responses during EPSC trains while accelerating the initial recovery from depression. The use of, for example, 1 mM kynurenic acid in the bath is a well-established strategy to attenuate postsynaptic effects at calyx synapses. All calyx EPSC recordings should have been performed under such conditions. Otherwise, recovery time courses and STP parameters are likely contaminated by postsynaptic effects. Since the effects of AMPAR saturation on EPSC_1 and desensitization on EPSC_ss may partially cancel each other, an unchanged relative STD in the presence of kynurenic acid is not necessarily a reliable indicator for the absence of postsynaptic effects. The use of kynurenic acid in the bath would have had the beneficial side effect of massively improving voltage-clamp conditions. For the typical values given in this MS (10 nA EPSC, 3 MOhm Rs) the expected voltage escape is ~30 mV corresponding to a change in driving force of 30 mV/80 mV=38%, i.e. initial EPSCs in trains are likely underestimated by 38%. Such large voltage escape usually results in unclamped INa(V) which was suppressed in this study by routinely including 2 mM QX-314 in the pipette solution. That approach does, however, not reduce the voltage escape.

      Glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) although it does in pre-hearing calyces (Yamashita et al, 2009). In fact, as shown in Figure S3, our results are essentially the same with or without kynurenate.

      (8) In the Results section (pages 7 and 8), the authors analyze the time course into STD during 100 Hz trains in the absence and presence of drugs. In the presence of drugs, an additional fast component is observed which is absent from control recordings. Based on this observation, the authors conclude that '... the mechanisms operate predominantly at the beginning of synaptic depression'. However, the consequences of blocking or slowing site clearing are expected to be strongly release-dependent. Assuming a probability of <20% that a fusion event occurs at a given release site, >80% of the sites cannot be affected at the arrival of the second AP even by a total arrest of site clearance simply because no fusion has yet occurred. That number decreases during a train according to (1-0.2)^n, where n is the number of the AP, such that after 10 APs, ~90% of the sites have been used and may potentially be unavailable for new rounds of release after slowing site clearance. Perhaps, the faster time course into STD in the presence of the drugs isn't related to site clearance?

      Enhanced depression at the beginning of stimulation indicates the block of rapid SV replenishment mechanism, which includes endocytosis-dependent site-clearance and scaffold-dependent vesicle translocation to release sites.

      (9) In the Discussion (page 10), the authors present a calculation that is supposed to explain the reduced size of the second calyx EPSC in a 100 Hz train in the presence of Dynasore or Pitstop-2. Does this calculation assume that all endocytosed SVs are immediately available for release within 10 ms? Please elaborate.

      We do not assume rapid endocytosed vesicle reuse within 10 ms as it requires much longer time for glutamate refilling (7s at PT; Hori & Takahashi, 2012). Instead, already filled reserved vesicles can rapidly replenish release sites if sites are clean and scaffold works properly. Results shown in Figure S6 also indicate that block of vesicle transmitter refilling has no immediate effect on synaptic responses.

      (10) It is not clear, why the bafilomycin/folimycin data is presented in Fig. S5. The data is also not mentioned in the Discussion. Either explain the purpose of these experiments or remove the data.

      These v-ATPase blockers, which block vesicular transmitter refilling, are reported to enhance EPSC depression at hippocampal synapses at RT and 2 mM [Ca2+] presumably because of lack of filled vesicles undergoing rapid vesicle recycling (eg Kiss & Run). We thought it important to determine whether these data have physiological relevance since such a mechanism might also regulate synaptic strength during repetitive transmission. However, our results did not support its physiological relevance. Since these results are not within our main questions, the negative results are shown it in supplementary Figure 6 and explained in the last paragraph of Result section (p11), but were not discussed further in Discussion section.

      (11) The scheme in Figure 7 is not very helpful.

      We updated the scheme to summarize our conclusion that vesicle replenishment through endocytosis-dependent site-clearance and scaffold-dependent mechanism independently co-operate to strengthen synaptic efficacy during repetitive transmission at calyx fast-signaling synapses. However, endocytic site clearance is solely required to support facilitation at slow-plastic hippocampal SC-CA1 synapses.

      Recommendations for the authors:

      First, my deep apologies for the long delay in reviewing your paper. All reviewers are now in agreement that the paper has valuable new information, but some methods are not described well and some results appear to be incompatible with previous results in the literature. The discussion of previous literature is also incomplete and not well-balanced. With more discussion of methods and literature strengthened this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms. We ask that you address the comments and revise your paper before we can fully recommend the paper as being an important contribution with compelling evidence and a strong data set that supports the conclusions.

      We explained methods more explicitly. Apparent incompatibility with previous results is now explained and discussed with new supplementary data.

      Major:

      (1) In this study, the application of Dynasore and Pitstop-2 strongly decreased 100 Hz steady-state release at calyx synapses while - quite unexpectedly - it strongly accelerated recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      Lack of change in the recovery from depression in dynamin-1 knockout mice by Mahapatra et al (2016) is consistent with results in Figure S4 in 2 mM [Ca2+], whereas accelerated recovery by Dynasore (Figure 2B2) is observed in 1.3 mM [Ca2+] suggesting that it is masked in 2 mM [Ca2+] but revealed in physiological [Ca2+] (p7, top paragraph). In both cases, however, recovery from STD is not prolonged unlike Hosoi et al (2009).

      The latrunculin issues are discussed in Results section with newly added Supplementary Figure S5 (p9-10).

      (2) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We made these points clearer in Method section and Result section.

      (3) Please cite and discuss briefly previous papers that have shown fast endocytosis in the calyx of Held with membrane capacitance measurements like Renden and von Gersdorff, J Neurophysiology, 98:3349, 2007 and Taschenberger et al., Neuron, 2002. These papers first showed exocytosis and endocytosis kinetics in more mature (hearing) mice calyx of Held and at higher physiological temperatures.

      One of these literatures relevant to the present study is quoted in p4.

      (4) The findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We added discussions on the issue of latrunculin in Result section by quoting previous literatures (p9-10). Since there is no direct evidence (by vesicle imaging) for the presence of FRP and SRP, these definitions derived from voltage clamp step-depolarization studies are difficult to incorporate into the dissection of synaptic depression in physiological conditions.

      Reviewer #1 (Recommendations For The Authors):

      I have no major comments, but the following issues may be addressed.

      (1) The term "fast and slow" synapses may be relative and a bit confusing. I do not think hippocampal synapses are slow synapses.

      We have replaced “fast and slow” by “fast-signaling and slow-plastic” to represent their functions and added explanation in the text.

      (2) Off-target effects of pharmacological effects may be discussed. In this respect, bafilomycin experiments can be used to argue against the slow effects of vesicle cycling such as endocytosis, and vesicle mobilization. However, the effects on rapid vesicle mobilization cannot be excluded entirely. Because I cannot exclude the absence of off-target effects either (can be addressed by looking at single vesicle imaging at nano-scale, which is hard to do or looking at EM level quantitatively?), I feel this is a matter of discussion.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) Fig2 A2, B2 and Fig 4 A2 and B2. It is easier to plot the recovery only normalized to the initial value. Subtracting steady-state is somewhat confusing because the recovery looks faster after deeper depression, but this may be just apparent.

      We have given values for both types of plots in Table 2, which indicates no essential difference in the recovery parameters.

      Reviewer #2 (Recommendations For The Authors):

      Line 51: Rajappa et al. (2016) investigated clearance deficits in synaptophysin KO mice (not synaptobrevin).

      Corrected.

      Line 54: intersectin is introduced as AZ scaffold protein, although in most of the literature, it is referred to as an endocytic scaffold protein (also in the cited one, e.g. Sakaba et al. 2013). At least, this should be discussed.

      Since blockers of intersectin downstream protein activity has no effect on vesicle endocytosis (Figure 3 and Sakaba et al, 2013), we called it (presynaptic) scaffold protein instead of endocytic scaffold protein.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Page 1, Title: I don't think the presented data address the role of the presynaptic scaffold in SV replenishment. In addition, 'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be implied here.

      In this study our focus was on the downstream activity of scaffold protein intersectin and since block of its downstream effector proteins CDC42 and actin activities do not obstruct the endocytic activity (Fig 3, and Sakaba et al., 2013), instead of naming it as “endocytic scaffold protein”, we adopted “presynaptic scaffold protein”.

      We have corrected it in the text.

      Page 2, Abstract: Clarify 'physiologically optimized condition' here and elsewhere in the manuscript.

      Abstract: in physiologically optimized condition → in physiological temperature and Ca2+.

      Page 3, line 62: I don't think 'the site-clearance hypothesis is widely accepted'. There are very few models that implement such a mechanism. Examples would be Pan & Zucker (2009) Neuron and Lin, Taschenberger & Neher 2022 (PNAS) which could be cited.

      62: the site-clearance hypothesis is “widely accepted”→ “well supported”

      Page 3 line 77: Please clarify 'fast synapses

      77: fast synapses→fast-signaling synapses, added clarification in the text.

      Page 4, line 100: Please clarify 'in the maximal rate'.

      100: in the maxima rate→reached during 1-Hz stimulation.

      Page 6, line 136: Please clarify 'to reduce the gap'.

      136: To reduce the gap between these different results→To explore the reason for these different results

      Page 7, line 157: I don't consider ML141 and Latrunculin-B 'scaffold protein inhibitors'.

      157: scaffold protein inhibitors had no effect on→ reworded as “none of these inhibitors affected fast or slow endocytosis”.  

      Page 7, line 162: P-value missing.

      162: p < 0.001 added.

      Page 8, line 184: "Since both endocytic blockers and scaffold inhibitors enhanced synaptic depression with a similar time course" consider rephrasing. Sounds like you refer to the time course by which these drugs exert their effect after being applied.

      184: Since both endocytic blockers and scaffold inhibitors enhance synaptic depression with a similar time course→Since the enhancement of synaptic depression by endocytic blockers or scaffold inhibitor occurred mostly at the early phase of synaptic depression.

      Same on page 11, line 250: "At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker" Please consider rephrasing.

      At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker →the early phase of synaptic depression like endocytic blockers

      Page 13, line 318: Please clearly state which experiments were performed at 1.3 mM and which at 2 mM external Ca if two different concentrations were used during recordings.

      320: Added text “Unless otherwise noted, EPSCs were recorded in 1.3 mM [Ca2+] aCSF at 37oC” in the methods.

      Page 15: line 346: Reference in the wrong format.

      346; (25) → (Yamashita et al, 2005)

      Page 15: line 351: Do you mean to say every 10 s and every 20 s? Please clarify.

      No, averaged at 10 ms and 20 ms, respectively as written.

      Page 16, line 369: 1 mM kyn was present in only very few experiments shown in the supplemental figures. Please clarify.

      368: In some experiments, to test in the presence of 1 mM kyn, if there is any difference in enhanced STD following endocytic block. However, as shown in Figure S3, our results are essentially the same with or without kynurenate, suggesting glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) unlike in pre-hearing calyces (Yamashita et al, 2009).

      Page 16, line 387: You cannot simply use multiple t-tests to compare a single control to multiple test conditions which seems to be the scenario here. Please correct or clarify.

      Experimental protocols are clarified in Methods as “Experiments were designed as population study using different cells from separate brain slices under control and drug treatment, rather than on a same cell before and after the drug exposure.”

      Table S1: 'Endo decay rate'. It's either the 'Endo rate' or the 'Deacy rate of delta Cm'. Please correct.

      Corrected as Endocytosis rate (Endo rate).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major change:

      All three of our reviewers raised the possibility that changes in movement during the time spent at the center ports could have contributed to changes in SWR rates. Analyses to address this possibility, based on the examination of trials with high and low speeds, were originally included in the supplement but we did not sufficiently highlight and explain these results. To rectify this, we have moved these results into a new main Figure 3 and now include a paragraph describing our interpretation of these results (page 9). We also include a more detailed description of the subjects’ behavior during port times – namely, that all subjects must remain quite stationary while at the reward ports in order to keep their nose in a specific position which keeps the port triggered. As a result, all subjects maintain head speeds well below our typical speed threshold for immobility while at the ports. This leads us to predict that any feedback based on periods of immobility alone (as requested by Reviewer 3) would show results very similar to our Control cohort and would not alter SWR rates seen during neurofeedback trials.

      Minor changes:

      (1) Reviewer 1 observed our that reported statistics appeared to be missing an interaction term showing that neurofeedback differentially affected the SWR rate/count pre- and postreward. We apologize for a lack of clarity here: we fit pre- and post-reward times with separate linear mixed effects models, so this interaction term is neither expected nor defined in our model. We have added a sentence clarifying this aspect of our LME approach in the Methods section: “Each model is designed to compare samples from all trials of the control group to samples from neurofeedback and delay trials from the neurofeedback cohort for a specific time period (for instance, pre-reward-delivery at the center ports).” Combining both times in the same model would require adding an additional hierarchical level in order to preserve the pairing of the pre- and post-reward time period for each trial, which we are concerned would complicate the formulation and interpretation of the model. However, the reviewer raises a good point that the comparison between these two time periods reveals an additional difference between the trial types: SWR rate remains relatively consistent between the pre- and post-reward periods during neurofeedback trials, while delay and control trials show a clear increase in SWR rate between the two time periods. To visualize and quantify this effect, we calculated the difference in SWR rates between the two time periods and now include this plot as Supplementary Figure 2F, which is referenced in page 8 of the main text.

      (2) Reviewer 2 found our original title, “Neurofeedback training can modulate task-relevant memory replay in rats” to be misleading and suggestive of a manipulation to memory content. We are in complete agreement with the Reviewer in that our manipulation does not alter replay content, so to be more specific and accurate, we have changed our title to their suggestion “Neurofeedback training can modulate task-relevant memory replay rate in rats” accordingly.

      (3) Reviewer 2 also requested that we include analyses quantifying baseline SWR rates for each of our experimental subjects. Although we initially considered reporting our results in measures of change relative to each individual animal’s baseline, we decided against this approach for several reasons.

      First, it is important to clarify that we extensively train the animals on the task prior to implant, so we do not have access to a truly naïve, pre-behavior baseline SWR rate for any of our subjects. However, because the pre-implant training is conducted consistently between our neurofeedback and our control cohort, we have no reason to believe that the behavioral training prior to implant would introduce differences in SWR rate between the cohorts. Indeed, we find no difference in post-reward SWR rate (or SWR rate at the home well) when we quantify the first 250 trials of post-implant behavior for each subject (see panel A below). Note that we cannot compare the pre-reward SWR rate at this point, because it is influenced by the task structure which guarantees at least one SWR in each neurofeedback trial pre-reward.

      Further, we do find that SWR rate is quite consistent over many days of task performance in the control cohort (show for the post-reward period in panel B below). This suggests that comparing the post-neurofeedback training SWR rates for the neurofeedback cohort to SWR rates throughout the training for the control cohort is not likely to be confounded by differing amounts of training experience. This is supported by our analyses in Figure 2 which show no differences in SWR rate between the two cohorts when considering pre- and post-reward times combined.

      Author response image 1.

      (A) SWR rate calculated during the post-reward period at the center port for the first 250 trials of postimplant behavior for each animal. Trials of all types are included (ie both neurofeedback trials and delay trials for the manipulation cohort). Groupwise comparison p=0.192. (B) Mean SWR rate during the post-reward period at the center port for each behavioral training epoch shows no systematic change over time across subjects within the control cohort.

      Finally, within each cohort, we found the overall SWR rates to be quite consistent across animals. If each subject in the neurofeedback cohort had shown dramatically different SWR rates at the beginning of neurofeedback training, we would have needed to express the effect of neurofeedback training relative to baseline for each animal. However, since the range of SWR rates were highly comparable, we felt that it was more accessible, and easier to place our results within the context of the literature, by expressing our results as simple SWR rates themselves rather than measures of relative change. Within the neurofeedback cohort, comparing neurofeedback to delay trials is inherently matched for baseline SWR rate since these comparisons are made within the same animal.

      (4) Finally, Reviewer 2 raises the possibility that older animals or those with cognitive deficits might respond to neurofeedback differently. We entirely agree with this possibility, and note this in our Discussion section: “Since the neurofeedback paradigm depends on the occurrence of at least a low endogenous rate of SWR occurrence, it would be important to implement neurofeedback training as a relatively early interventional strategy prior to extensive neurodegeneration, and training may take longer in aged or impaired subjects.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.

      We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:

      “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”

      We also rephase our research questions in the introduction section on Page 5 with tracked changes:

      “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”

      Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:

      “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”

      Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”

      References:

      Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281

      Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200

      Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212

      Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986

      (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.

      We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.

      (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,

      “Last night, my lazy brother/jacket came to the party one minute before it was over.

      Lily says this blue jacket/brother will be a big fashion trend this fall.”

      In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.

      (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.

      The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).

      We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.

      Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.

      Author response image 1.

      We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:

      “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.

      We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:

      “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”

      References:

      Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005

      Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058

      Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061

      McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972

      Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.

      Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.

      We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:

      “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”

      We reformulated the last sentence:

      “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”

      (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.

      We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).

      In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:

      “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”

      References:

      Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.

      We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.

      Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.

      (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?

      We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:

      "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."

      (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?

      In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:

      “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”

      (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).

      We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:

      "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."

      References:

      Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.

      (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?

      We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:

      "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."

      (6) Source analysis, p. 25f.: How was the beamformer regularized?

      This information was already included in the original manuscript on Page 26. The original text is provided below for reference:

      “No regularisation was performed to the CSD matrices (lambda = 0).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Zhu, et al present a genome-wide histone modification analysis comparing patients with schizophrenia (on or off antipsychotics) to non-psychiatric controls. The authors performed analyses across the dorsolateral prefrontal cortex and tested for enrichment of nearby genes and pathways. The authors performed an analysis measuring the effect of age on the epigenomic landscape as well. While this paper provides a unique resource around SCZ and its epigenetic correlates, and some potentially intriguing findings in the antipsychotic response dataset there were some potential missed opportunities - related to the integration of outside datasets and genotypes that could have strengthened the results and novelty of the paper.

      Major Comments

      (1) Is there genotype data available for this cohort of donors or can it be generated? This would open several novel avenues of investigation for the authors. First the authors can test for enrichment of heritability for SCZ or even highly comorbid disorders such as bipolar. Second, it would allow the authors to directly measure the genetic regulation of histone markers by calculating QTLs (in this case histone hQTLs). The authors assert that although interesting, ATACseq approach does not provide the same chromatin state information as histone mods mapped by ChiP. Why do the authors not test this? There are several ATACseq datasets available for SCZ [https://pubmed.ncbi.nlm.nih.gov/30087329/]and an additional genomic overlap could help tease apart genetic regulation of the changes observed.

      As detailed in our Methods section, brain samples have previous medical diagnosis, treatment record, and toxicological screening. Unfortunately, there was no genotype information on our brain sample collection. However, we examined overlap of differential enhancer and promoter peaks with genetic variants using linkage disequilibrium score regression (Fig. S10). Additionally, to assess agreement with the literature, we compared DEGs identified in our study with a previous snRNA-seq study in postmortem prefrontal cortex of schizophrenics and controls (Table S7).

      Repressive histone marks tend to provide different information than ATAC-seq data. However, we examined only activating marks in this study. Thus, the sentence in the Introduction mentioning that “ATAC-seq approach does not provide the same chromatin state information as histone modifications mapped by chromatin immunoprecipitation sequencing (ChIP-seq) assays do” has been removed.

      (2) Can the authors theorize why their analysis found significant effects for H3K27Ac for antipsychotic use when a recent epigenomic study of SCZ using a larger cohort of samples and including the same histone modifications did not [https://pubmed.ncbi.nlm.nih.gov/30038276/]? Given the lower n and lower number of cells in this group, it would be helpful if the authors could speculate on why they see this. Do the authors know if there is any overlap with the Girdhar study donors or if there are other phenotypic differences that could account for this?

      As mentioned in the Methods sections, three strengths of this brain bank include i) inclusion of samples of schizophrenia subjects with antemortem diagnosis (i.e., based on clinical histories) and not with postmortem diagnosis (i.e., based on interviews with relatives and friends – a diagnostic approach used by many brain banks worldwide but with important limitations, see here: PMID: 15607306), ii) inclusion of control subjects individually matched by sex, age and PMD, and iii) our possibility to test the presence or absence of antipsychotic medications in blood samples as an independent experimental variable. This allowed us to obtained novel and statistically valid conclusions related to cell-type epigenetic alterations in the frontal cortex of schizophrenia subjects, and the impact of age and antipsychotic treatment on chromatin organization.

      There is no overlap with Girdhar study donors.

      (3) The reviewer is concerned about the low concordance between bulk nuclei RNA-seq and single-cell RNA-seq for SCZ (236 of 802 DEGs in NeuN+ and 63 of 1043 NEuN-). While it is not surprising for different cohorts to have different sets of DEGs these seem to be vastly different. Was there a particular cell type(s) that enriched for the authors' DEGs in the single-cell dataset? Do the authors know if any donors overlapped between these cohorts?

      This overlap is acceptable considering that these are datasets originated from an entirely distinct cohort of postmortem human brain samples.

      (4) Functional enrichment analyses: details are not provided by the authors and should be added. The authors need to consider a) providing a gene universe, ie only considering the sets of genes with nearby H3K4me3/ H3K27ac levels, to such pathway tools, and b) should take into account the fact that some genes have many more peaks with data. There are known biases in seemingly just using the best p-value per gene in other epigenetic analysis (ie. DNA methylation data) and software is available to run correct analyses: https://pubmed.ncbi.nlm.nih.gov/23732277.

      GREAT was used to map differential peak loci to target genes using the whole genome as the background set and default basal extension as per Nord et al. http://dx.doi.org/10.1016/j.cell.2013.11.033. We argue that it is more biologically relevant than comparing against an artificially selected background. These gene sets were then passed to Panther for Gene Ontology enrichment analysis as per Liu et al. 10.1186/s12940-015-0052-5.

      Additional details are provided in Materials and Methods section:

      ChIP-seq annotation and functional enrichment

      GREAT analysis (http://great.standford.edu) was performed on differential peaks using the whole genome as background and default basal extension from 5kb upstream to 1kb downstream of the TSS.

      Significantly enriched Gene Ontology biological processes were identified using the Panther Classification tools using a hypergeometric test.

      Reviewer #2 (Public Review):

      The manuscript by Zhu has generated ChIP-seq and RNA-seq data from sizeable cohorts of SCZ patient samples and controls. The samples include 15 AF-SCZ samples and 15 controls, as well as 14 AT-SCZ samples and 14 controls. The genomics data was generated using techniques optimized for low-input samples: MOWChIP-seq and SMART-seq2 for histone profiles and transcriptome, respectively. The study has generated a significant data resource for the investigation of epigenomic alterations in SCZ. I am not convinced that the hierarchical pairwise design - first comparing AF-SCZ and AT-SCZ with their corresponding controls and secondarily contrasting the two comparisons is fully justified. The authors should repeat the statistical analysis by modeling all three groups simultaneously with an interaction effect for treatment or directly compare AF-SCZ to AT-SCZ groups and evaluate if the main conclusions remain supported.

      Major comments

      (1) The manuscript did not discuss (mention) the quality control of RNA-seq data shown in Fig. 1B. The color scheme choice for the heatmap visualization did not provide a quantitative presentation of the specificity of the RNA-seq data. I would recommend using bar plots to present the results more quantitatively.

      QC of raw RNA-seq data including per sequence GC and adapter content was assessed with FastQC. Reads underwent soft-clipping during STAR alignment with on average 73.8% (+/- 0.08%) reads for neurons and 69.0% (+/- 0.99%) reads for glia being uniquely mapped. A new supplementary figure (Figure S5) has been included to show four bar plots representing the expression values more quantitatively.

      These details are now provided in the RNA-seq data processing part of the Materials and Methods section:

      RNA-seq data processing

      The human genome (GRCh38) and comprehensive gene annotation were obtained from GENCODE (v29). Quality control of RNA-seq reads including per sequence GC and adapter content was assessed with FastQC. Reads were mapped with STAR (2.7.0f) with soft-clipping (average of 73.8% (+/- 0.08%) reads uniquely mapped for neurons and 69.0% (+/- 0.99%) reads for glia) and quantified with featureCounts (v2.0.1) using the default parameters.

      (2) How does the specificity of this RNA-seq dataset compare to previous studies using a similar NeuN sorting strategy?<br /> As mentioned in the Results section, highly significant (median p-value = 6 ´ 10-7) pairwise differences in molecular marker expression were observed for all markers ranging from mature, functional and synaptic neuron markers to astrocyte, oligodendrocyte and microglial markers (Figure 1B; Figures S4 and S5; Table S5). This confirms neuronal and non-neuronal cell-type identities in the NeuN+ and NeuN- nuclei samples, respectively.

      (3) I appreciate the effort to assess the ChIP-seq data quality using phantompeakqualtools. However, prior knowledge/experience with this tool is required to fully understand the QC results. The authors should additionally provide browser shots at different scales for key neuronal/glial genes, so readers can have a more direct assessment of data quality, such as the enrichment of H3K4me3 at promoters (but not elsewhere), and H3K27ac at promoters and enhancers. Existing browser views, such as Fig. 2B are too zoomed out for assessing the data quality.

      A new Fig 2B has been generated with a magnified view for clearer examination.

      (4) The pairwise regression model should be explicitly reported in methods.

      Additional details are included in the Methods section:

      Differential analysis for RNA-seq data

      We analyzed the bulk RNA-seq data of 29 schizophrenia subjects and 29 controls. The initial step involved filtering out genes with low read counts (less than 20 reads in over 50% of samples). The analysis then employed a two-step method to estimate the technical and biological noise. The first step was identifying the top 10 principal components (PCs) of the dataset. Subsequently, the correlation between each PC and various experimental (alignment rate, unique rate, exon percentage, number of unique mapped reads) and demographic (sex, age at death, PMD, antemortem diagnosis) factors was calculated. Covariates with high correlation to the PCs were included in the analysis to minimize their impact. The analysis was conducted using the 'DESeq2' software package, and genes with a false discovery rate (FDR) below 0.05 were identified as differentially expressed.

      (5) The statistical strategy to compare AF-SCZ and AT-SCZ to their corresponding control groups was unjustified. Why not model all three groups simultaneously with an interaction effect for treatment or directly compare AF-SCZ to AT-SCZ groups? If the manuscript argues that the antipsychotic effect is the main novelty, why not directly compare AF-SCZ and AT-SCZ?

      This is an important point. As mentioned above, one of the main strengths of our experimental design is that schizophrenia subjects and controls were individually matched by sex and age and (if possible) postmortem delay and freezing storage time. Our study is also among the first to report the potential impact of antipsychotic treatment on chromatin organization using postmortem human brain samples. Because of this individual matching method, we only compared schizophrenia subjects (either antipsychotic-free or antipsychotic-treated) with their respective individually matched controls. This experimental design is supported by our previous publications with postmortem human brain samples (PMID: 36100039; PMID: 28783139; PMID: 26758213; PMID: 23129762; PMID: 22864611; PMID: 18297054). The rationale behind this experimental design – as well as potential limitations particularly related to the division of the schizophrenia group in antipsychotic-free and antipsychotic-treated – is mentioned in the Discussion:

      Related to the effect of antipsychotic treatment, frontal cortex samples of schizophrenia subjects were divided into AF and AT based on postmortem toxicological analysis in both blood and when possible brain samples, which provides information about a longer retrospective drug-free period due to the high liposolubility of antipsychotic medications (Voicu and Radulescu, 2009). However, we cannot fully exclude the possibility of previous exposure to antipsychotic medications in the AF-schizophrenia group, and hence that the epigenetic alterations observed exclusively in the AF-schizophrenia group are a consequence of a potential period of decompensation, which typically occurs following voluntary treatment discontinuation (Liu-Seifert et al., 2005).

      It is also worth mentioning here that data were analyzed both at the cohort level, as well as at an individual level (schizophrenia/cohort pairs). This is mentioned in the manuscript:

      It should be noted that in the differential analyses here, the schizophrenia subjects (whether AF or AT) and their controls were compared at the cohort level, while matched schizophrenia/control pairs were examined individually in the TF-based analyses.

      (6) The method of pairwise comparison to corresponding control groups, then further comparing the pairwise results opens the study to a number of statistical vulnerabilities. For example, on page 12, the studies identified 166 DEGs between AF and control, and 1273 DEGs between AT and control. Instead of implicating a greater amount of difference between AT and control, such a result can often be driven by differences in between-group variance, rather than between-group means, that is, are the SCZ-AF and SCZ-treated effect size magnitudes and directionalities similar (but the treated group has lower variance) or are the two groups truly different in terms of means? The result in Fig. 5A suggests effect sizes for the two comparisons (AF-Ctrl and AT-Ctrl) are similar but have lower variability in the treated group.

      For a discussion regarding our approach, which involves a pairwise comparison, see above.

      (7) The pairwise comparison further raised the possibility the results were driven by the difference in the two control cohorts rather than the two SCZ cohorts.

      We clearly show that age is an important independent factor (Fig 7). Since controls are individually matched by sex and age, this limits the validity of the comparison among the two cohort groups including subjects of different age (see Tables S1 and S2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor Comments

      (1) Why not mention what histone modifications you measured by Chip-seq in the abstract? A certainly minor point but I felt I read for quite a while before I got to that point in the intro.

      The two histone marks are now mentioned in the abstract.

      (2) There are several places in the introduction where improper grammar is utilized and this should be edited.

      Introduction has been edited.

      (3) Related to major comments, how many donors overlapped with the PsychENCODE, CommonMind papers?

      Our datasets were generated from an entirely distinct cohort of postmortem human brain samples. Our postmortem sample collection does not overlap with postmortem samples included in PsychENCODE and/or CommonMind publications.

      (4) Since studies have already measured H3K4me3 and H3K27ac in the SCZ prefrontal cortex, why didn't the authors consider measuring changes in a related repressive marker? This is not to suggest the authors should do that now, but additional comments about other markers would help provide context for this analysis and point toward potential future studies.

      This is an interesting question and will be the goal of our future investigation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Why does stimulation at 0.15 Hz show a third harmonic signal (Figure 5A) but 0.25 Hz does not show a second harmonic signal?

      Second and third harmonic signals were sometimes observed in 0.15 Hz and also in 0.25 Hz and other frequency stimulations. The second harmonic signal is easier to understand as vasomotion may be reacting to both directions of oscillating stimuli. The reason for the emergence of the third harmonics was totally unknown. These harmonic signals were not always observed, and the magnitude of these signals was variable. The frequency-locked signal was robust, thus, in this manuscript, we decided to describe only this signal. These observations are mentioned in the revised manuscript (Results, page 9, paragraph 2).

      References for the windows are missing. Closed craniotomy: (Morii, Ngai, and Winn 1986). Thinned skull: (Drew et al. 2010).

      These references were incorporated into the revised manuscript.

      An explanation of, or at least a discussion on, why a flavoprotein or other intrinsic signal from the parenchyma might follow vasomotion with high fidelity would be most helpful.

      We spend a large part of the Results describing that any fluorescence signal from the brain parenchyma follows the vasomotion because the blood vessels largely lack fluorescence signals within the filter band that we observe. This is described as “shadow imaging”. What was rather puzzling was that flavoprotein or other intrinsic signals were phase-shifted in time. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. This is described in the manuscript as the following.

      (Results, page 13, paragraph 2)

      “Production and degradation of flavin and other metabolites may be induced by the fluctuation in the blood vessel diameter with a fixed delay time. The phase shift in the autofluorescence could be due to the additive effect of “shadow” imaging of the vessel and to the concentration fluctuation of the autofluorescent metabolite”

      Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections to the text and figures:

      (1) Figures 1 and 2- The single line slice basal and dilated traces are larger in Figure 2 (intact skull) than in Figure 1 (thinned skull)- have these been mixed up, as the authors state in the text that larger dilations are detected in the thinned skull preparation?

      The example vessel described for the thinned skull (Figure 1) happened to be larger than that shown for the intact skull (Figure 2). We did not describe that larger dilations are observed in the thinned skull preparation. What was described was that the vessel profiles were shallower in the intact skull. This is because the presence of the intact skull blurs the fluorescence image.

      (2) Figure 3- I think the lower panel of the amplitude spectrums from 3 individual animals included in D would benefit from being in its own panel within this Figure (i.e. E). The peak ratio is also used in this figure, but the equation to calculate this is not displayed until Figure 4.

      We thank the reviewer for recommending making the figure more comprehensible. We have divided panel D into D and E and shifted the panel character accordingly. The manuscript text was also updated.

      As the reviewer describes, the peak ratio of 0.25 Hz is used in Figure 3E (original). However, the equation to calculate this figure is described in the appropriate location within the main text of the manuscript (Results, page 10, paragraph 2) as well as in the figure legend.

      (3) Figure 5- In the visual stimulation traces displayed in C you have included a 10-degree scale bar, which looks similar in amplitude to the trace but the text states these are 17-degree amplitude traces.

      We thank the reviewer for noticing this mistake of labeling in the figure. We have corrected the error in the revised figure.

      (4) Figure 6- For the Texas red fluorescence traces and image scales displayed in F, you have shown the responding traces on the right and non-responding on the left, but the figure legend states the amplitude is strong on the left and weak on the right.

      We thank the reviewer for noticing the error in the figure legend text. We have corrected the error in the revised manuscript.

      (5) Figure 6- It would be helpful for the reader if the r value was displayed on the graph in G.

      We thank the reviewer for the suggestion. We have indicated the r value in Figure 6G as the reviewer recommended.

      Reviewer #3 (Recommendations For The Authors):

      Major

      It is unclear to me if the authors are studying vasomotion per se. Vasomotion is an intrinsic, natural rhythm of blood vessel diameter oscillation that is entrained by endogenous rhythmic neural activity. Importantly, if you take neural activity away, the blood vessel (with flow and pressure) should still be capable of oscillating due to an intrinsic mechanism within the vessel wall. In contrast, if one increases neural activity by way of sensory stimulation and blood flow increases, this is the basis of functional hyperemia. If one stimulates the brain over and over again at a particular frequency, it is expected that blood flow will increase whenever neural activity increases to the stimulus, up to a particular frequency until the blood vessel cannot physically track the stimulus fast enough. Functional hyperemia does not depend on an intrinsic oscillator mechanism. It occurs when the brain becomes active above endogenous resting activity due to sensory or motor activity.

      We thank the reviewer for stressing the importance of the distinction between “vasomotion” and functional “hyperemia”.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia, with both vasoconstriction and vasodilation, induced with slow oscillating visual stimuli was called “visually induced vasomotion”. This distinction in the terminology is now explicitly introduced in the revised manuscript (Introduction, page 3, paragraph 2-3; page 4, paragraph 1-2).

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed. Importantly, this visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. How much of the visually induced vasomotion relies on the mechanisms of intrinsic spontaneous vasomotion is also undetermined. Discussion about the future directions of understanding the mechanisms of visually induced vasomotion and entrainment is described in better detail in the revised manuscript (Discussions, page 19, paragraph 1).

      To me, one would need to silence the naturally occurring vasomotion to study it. As soon as one activates the brain with an external stimulus, functional hyperemia is being studied. One idea that would be interesting to look at is whether a single or perhaps a double stimulus, in an untrained vs trained mouse, shows vasodilation that occurs across the cortex and in the cerebellum. In other words, is there something special about repeating the signal over and over again that results in brain-wide synchronization, or does a single or double oscillation of the same frequency (0.25Hz) also transiently synchronize the brain? My guess is that a short stimulus would give you the same thing (especially in a trained mouse) and that there is nothing special about oscillating the signal over and over again (except for the learning component).

      We thank the reviewer for the ideas of new experiments to understand whether the visually induced vasomotion shares the same mechanisms for creating spontaneous vasomotion or not.

      We would like to emphasize again that the visually induced vasomotion is not observed in the Novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to the visual stimuli. Entrainment with repeated presentation of visual stimuli is required for this global synchronization phenomenon to occur.

      We would also like to emphasize that, even in Expert animals, the visually induced vasomotion that is frequency-locked to the presented stimulus does not always occur immediately. As shown in Figure 3D lower panel (Figure 3E in the revised figure), the vasomotion did not always immediately frequency-lock. The vasomotion was also not always stable throughout the 15 min of visual stimulation presentation. These characteristics are emphasized in the revised manuscript (Results, page 10, paragraph 1).

      Therefore, we would assume that a single or double frequency of the visual stimulation would not always be sufficient to transiently frequency-lock the visually induced vasomotion.

      An alternative idea is to test frequencies lower than vasomotion. Vasomotion typically oscillates around a wide range of very low frequencies averaging around 0.1Hz, yet here the authors entrain blood vessel oscillations towards the top end of vasomotion, at 0.25Hz. What would happen if the authors tried synchronizing brain activity with 0.025Hz? Would the natural vasomotion frequency still be there, or would it be gone, dominated by the 0.025Hz entrainment?

      We would assume that visually induced vasomotion will not be induced with 0.025 Hz visual stimuli. This is too slow to induce smooth pursuit of the visual stimuli with eye movement. We show that, even if smooth eye pursuit occurs, the visually induced vasomotion may or may not occur (Figure 6F). However, visually induced vasomotion does not largely occur without eye movement. Therefore, the proposed experiment by the reviewer is likely not doable.

      Finally, perhaps the authors can see if there is a long-lasting change in natural vasomotion occurring after the animal has been trained to 0.25Hz. For example, is there greater power in the endogenous fluctuation at either 0.25Hz (or perhaps 0.1Hz) with no visual stimulation given but after the animal has been trained? These ideas would be interesting to test and could help clarify whether this is plasticity in functional hyperemia or plasticity in vasomotion.

      It should also be mentioned that the frequency-locked vasomotion quickly dissipates as soon as the visual stimulation is halted (Figure 3D upper panel, middle). However, we agree with the reviewer that it would be interesting to see whether the fragmentation of the spontaneous vasomotion is observed less in the Trained or Expert mice compared to the Novice mice, to understand whether the entrainment effect would propagate to the properties of the spontaneous vasomotion.

      This issue I have raised is not a fundamental flaw in the paper, it pertains more to the wording, phrasing, and pitch of the paper i.e. is this really entrained and plastic vasomotion? I am skeptical. Nevertheless, I think the authors should try some of these suggestions to better characterize this effect.

      We agree that the phrasing used in the original manuscript was rather confusing, as “vasomotion” normally refers to spontaneous vascular movement. However, functional “hyperemia” may not adequately express the phenomenon that we observe either. The phenomenon that we observe is slowly oscillating vasodilation and vasoconstriction that is induced with visual stimuli with a temporal frequency similar to the spontaneously occurring “vasomotion”. This phenomenon is not a direct hyperemia response to the visual stimuli as it requires entrainment and it spreads globally throughout the whole brain. We revised our manuscript to define the terminology that we use.

      An important question is if neural activity is entraining the CBF responses. The authors should do one experiment in a pan-neural GCaMP line to test if neural activity in the visual cortex (and other areas captured in the widefield microscope) shows a progressive and gradual synchronization (or not) to the vasomotion responses with training. It is possible to do this through a thinned skull window. This important to know if/how synchronized population neural activity scales with training. Perhaps they will not correlate and there is something more subtle going on.

      In our paper, we mainly studied visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex.

      An important point that should be pointed out is that the neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated. This argument is now incorporated in the revised manuscript (Discussions, page 19, paragraph 1).

      We agree with the reviewer that, to identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required. We recognize this “shadow” effect and we are currently developing methods to take out the “shadow” effect and the intracellular pH fluctuation effect from the fluorescence traces.

      The authors nicely show that plasticity in vasomotion coincides with the mouse learning the HOKR task and that as eye movement tracks the stimulus, CBF gets entrained. However, there could also be a stress effect going on in the early trials, and as the mouse gets used to the procedure and stress comes down, the vasomotion entrainment can be seen. It could be the case that the vasomotion process is there on the first trial, but masked by stress-induced effects on neural and/or vascular activity. I did not see anything in the methods about how the mouse was habituated to head restraint. Was the first visual stim trial the first time the mouse was head restrained? If so, there could be a strong stress effect. The authors should address this either by clarifying that habituation to head restraint was done, or by doing a control experiment where each animal receives at least 1week of progressive and gradual head restraint before doing the same HOKR experiment using multiple trials.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the TexasRed experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, TexasRed was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, TexasRed was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion. This argument is included in the revised manuscript (Discussions, page 20, paragraph 2).

      Minor

      The first sentence of the introduction requires citations. It is also a somewhat irrelevant comparison to make.

      Necessary citation was made in the revised manuscript, as the reviewer suggested. We think that describing how the energy is distributed in the brain would provide one of the most important breakthroughs to the understanding of how efficient information processing in the brain works. Therefore, we would like to keep this introduction.

      The third and fourth sentence of the introduction equates vasodilation/vasoconstriction with vasomotion and it is not this simple. Vasomotion is a specific physiological process involving rhythmic changes to artery diameter. Also, the frequency of these slow oscillations needs to be stated. The authors only say they are slower than 10Hz.

      The definition of spontaneous vasomotion with indication of typical temporal frequency is described in the revised manuscript, as the reviewer suggested.

      More than half of the introduction is describing the paper itself, rather than setting the stage for the findings. The authors need a more thorough account of what is known and what is not known in this area. Some of this information is in the discussion, which should be moved up to the intro.

      We have revised the introduction to include the definition of spontaneous vasomotion and visually induced vasomotion or functional hyperemia, as the reviewer suggested.

      In the first paragraph of the results section, the authors should state in what way the mice are awake. Are they freely mobile? Are they head-restrained? Are they resting or moving or doing both at different times? This is clarified later but it should come up front as someone reads through the paper.

      As the reviewer suggested, we clarified that the experiments were done in awake and head-restrained mice within the first paragraph for the Results section.

      The authors say "As shown later, blood vessels on the surface...". There is no need to say "as shown later".

      This is deleted as the reviewer suggested.

      The use of "full width at 10% maximum" of the Texas red intensity for the diameter measure is a little odd, as it may actually overestimate the diameter, but I see what the authors were trying to do. A full-width half max is standard here and that is likely more appropriate. Also, the line profiles of intensity are not raw data. The authors say the trace is strongly filtered/smoothed. If so, this creates a somewhat artificial platform to make the diameter measurement. The authors should show raw data from a single experiment and make the measurement from that. The raw line profile should look almost square, where a full-width half-max would work well.

      Contrary to what the reviewer observed, the raw line profile was not almost square. Even if there were almost no blur in the XY dimension in the optical imaging system, one would not expect to see a square line profile, as the thickness of the vessel increases in the Z dimension towards the center, as this is not a confocal or two-photon microscope image, and an ideal optical section was not created. Therefore, the full-width half-maximum value would definitely be an underestimate of the actual vessel diameter. It may be possible to equate an ideal value for cutoff if we have the 3D point spread function of the imaging. 10% is an arbitrary number but we think 10% is the minimum intensity that we can distinguish from the background intensity fluctuations. We did not attempt to derive the “true” diameter of the vessel and full-width at 10% maximum is just an index of the actual diameter. In most of the manuscript, we only deal with the change of the vessel diameter relative to the basal diameter, therefore, we considered that careful derivation of the absolute diameter estimate is not necessary. This argument is detailed in the Materials and Methods section in the revised manuscript (page 31, paragraph 2).

      The raw line profile before filtering is shown overlaid in Figure 1C, as the reviewer suggested.

      In Figures 1 and 2, state/label what brain region this is.

      The blood vessels between the bregma and lambda on the cortex were observed and described in Figures 1 and 2. This is described in the revised manuscript, as the reviewer suggested.

      Can the authors also show what a vein or venule looks like using their quantification method in Figures 1 and 2? This would be a helpful comparison to a static vein.

      The methods shown in Figures 1 and 2 would not allow us to distinguish between vein and venule in our study. Methods that allow quantification of the relative blood vessel diameter fluctuation due to spontaneous or visually induced vasomotion activities are shown in Figures 1 and 2. Later in the manuscript, the whole intensity fluctuation of TexasRed or autofluorescence in the brain parenchyma is studied, and in this case, no distinction between vein and venules could be made.

      Statements such as this are not necessary: "Later in the manuscript, we will be dealing with vasomotion dynamics observed with the optical fiber photometry methods, in which the blood vessel type under the detection of the fiber could not be identified". Simply talk about this data when you get to it.

      We have deleted this statement in this part of the manuscript, as the reviewer suggested.

      Same as this, please consider deleting: "Spontaneous vasomotion dynamic differences between different classes of blood vessels would be of interest to study using a more sophisticated in vivo two-photon microscope which we do not own." Just describe the data you have from the methods you have. There is no need to lament.

      We deleted this sentence, as the reviewer suggested.

      Figure 3 D the light blue boxes showing the time period of visual stimulation physically overlay with the frequency-time spectrograms. They should not overlay with this graph because it makes them more light blue, distorting the figure which also uses light blue in the heat map.

      Figure 3D was modified, as the reviewer suggested.

      The authors say: "The reason why the vasomotion detected in our system through the intact skull in awake in vivo mice was less periodic was unknown." Yes, but you are imaging an awake mouse. Many spontaneous behaviours such as whisking, grooming, twitching, and struggling will manifest as increased artery diameter. These will be functional hyperemia occurring events on top of rhythmic vasomotion. This can be briefly discussed.

      As the reviewer comments, the vasomotion detected in awake mice was likely to be less periodic because the spontaneous animal behavior induces functional hyperemia and interrupts spontaneous vasomotion. This interpretation was included in the revised manuscript (Results, page 8, paragraph 1).

      The authors say "extremely tuned" on page 8. They should not use words like "extremely". Perhaps say "more strongly tuned" or equivalent.

      We have changed “extremely” to “more strongly”, as the reviewer suggested.

      The authors say "First, the Texas Red fluorescence images were Gaussian filtered in the spatial XY dimension to take out the random noise presumably created within the imaging system." It is inadvisable to alter the raw data in this way unless there is a sound reason to do so. If there is random noise this should not affect the Fast Fourier Transform analysis. If there is regular noise caused by instrumentation artefact, which is picked up by the analysis then perhaps this could be filtered out. A static Texas red sample in a vial can be used to determine if there is artefactual noise.

      We mainly used the Gaussian filter for better presentation of the imaged data. The TexasRed fluorescence was low in intensity and the acquired images were Gaussian filtered in the spatial XY dimesion to reduce the pixelated noise at the expense of spatial resolution reduction. This filter should not affect the temporal frequency of the observed vasomotion. This is now more clearly indicated in the revised manuscript (Results, page 10, paragraph 2).

      There are endogenous fluorescent molecules in cell metabolism that change dynamically to neural activity: NADH, NADPH, and FAD. These are almost certainly a fraction of the auto-fluorescent signal the authors are measuring and it would be expected to see small fluctuations in these metabolites with neural activity. Perhaps this can be discussed, and the authors can likely argue that metabolic signals are much smaller than the change caused by vasodilation.

      We found that the autofluorescence signal was phase-shifted in time relative to the vasomotion, which was visualized with TexasRed. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. It is also expected that these metabolites may fluctuate according to the neuronal activity that triggers visually induced vasomotion or functional hyperemia. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      The authors say "however, we found that, if Texas Red had to be injected before every training session, the mouse did not learn very well." This is interesting. Why do the authors suppose this was the case? Stress from the injection? Or perhaps some deleterious effect on blood vessel function caused by the dye itself? Either way, I think this honest statement should remain. Others need to know about it.

      We think that the stress from the injection interferes with the HOKR learning. However, as shown, TexasRed injection after the mouse had learned did not interfere with the eye movement or with the visually induced vasomotion. We do not know whether the injection stress directly interferes with the blood vessel function and affects the plastic vasomotion entrainment. These arguments are now described in the revised manuscript (Discussions, page 20, paragraph 2). The statement above remains as is, as the reviewer suggested.

      YCnano50 is a calcium sensor and not really appropriate for the use employed by the authors. They are exciting YFP at 505nm but unless the authors are using a laser line, there is some bandwidth of excitation light that is likely exciting the CFP too which still absorbs light up to ~490nm. Here, calcium signalling may affect the YFP signal. This can be discussed.

      Multiband-pass filter (Chroma 69008x with the relevant band of 503 nm / 19.5 nm (FWHM)) was used for direct excitation of YFP. Negligible light is passed below 490 nm. CFP excitation above 490 nm is assumed to be negligible and usually not defined in literature. We assume that with our optical system, fluorescence by direct YFP excitation dominates the effect from the minor CFP excitation effect. We explicitly describe this in the revised manuscript (Materials and Methods, page 28, paragraph 2).

      The discussion is interesting but does not actually discuss much of the data or measurements in the paper. Most of the discussion reads more like a topical review, rather than a critical analysis of the effects/measurements and why the authors' interpretations are likely correct. This can be improved.

      As the reviewer suggests, we have improved the discussion by starting with the summary of the results (Discussion, page 19, paragraph 1). We also included the possibility of stress affecting visually induced vasomotion (Discussion, page 20, paragraph 2).

    1. Author Response

      OVERVIEW OF RESPONSE TO REVIEWS

      I thank the three anonymous reviewers for providing well-informed, constructive feedback on the initial version of this manuscript. Based on their comments I will revise the manuscript and hopefully improve it in several ways. I expected a great deal of resistance to the ideas proposed in this model because they break from traditional approaches. One of my goals in developing this model was to argue for a paradigm shift regarding the concept of a “receptive field”. Experimentally, the receptive field is defined as the set of preferred environmental sensory circumstances that cause a neuron to become highly active. Traditional interpretation of receptive fields implicitly assumes that the environmental circumstances that give rise to the receptive field do so in a purely bottom-up fashion (the cell is “receiving” its field), in which case the receptive field specifies the function of the cell. In other words, the receptive field is what the cell does. However, some brain regions (e.g., entorhinal cortex) receive substantial feedback from downstream regions (e.g., hippocampus), and feedback can play an important role in determining the receptive field. As applied to a memory account of MTL, this feedback is memory retrieval and reactivation. Thus, the multifield spatial response of grid cells doesn’t necessarily mean that their function is spatial. Consideration of bottom-up versus top-down signals gives rise to the proposal that the bottom-up preference of many grid cells is some non-spatial attribute even though they exhibit a spatial receptive field owing to retrieval in specific locations.

      One thing I will emphasize in a revision is that this model can address findings in the vast literature on learning, memory, and consolidation. The question asked in this study is whether a memory model can also explain the rodent navigation literature. This is not an attempt to provide definitive evidence that this is a better account of the rodent navigation literature. Instead, the goal is to model the rodent navigation literature even though this is a memory model rather than a spatial/navigation model. Nevertheless, within the domain of rodent spatial/navigation, this model makes different predictions/explanations than spatial/navigation models. For instance, this is the only model predicting that many grid cells with spatial receptive fields are non-spatial (see predictions in Box 1). As reviewed in Box 1, this is the only model that can explain why head direction conjunctive grid cells become head direction cells in the absence of hippocampal feedback and it is the only model that can explain why some grid cells are also sensitive to sound frequency (see several other unique explanations in Box 1).

      This study is an attempt to unify the spatial/navigation and learning/memory literatures with a relatively simply model. Given the simplicity of the model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations. The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental or evolutionary time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells. In evolution and/or in development, it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      Grid cell models that are purely spatial are agnostic regarding the thousands of findings in the literature on memory, learning, and consolidation whereas this model can potentially unify the learning/memory and spatial/navigation literatures. The reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account. There are other grid cell models that can explain non-spatial grid-like responses (Mok & Love, 2019; Rodríguez‐Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015) and these models may be similarly positioned to explain memory results. However, these models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (these models would need to assume that rodent hippocampus is almost entirely concerned with spatial navigation). This account provides an answer to this conundrum by proposing that grid cells with spatial receptive fields have been misclassified as spatial. Below I give responses to some of the specific comments made by reviewers, grouping these comments by topic:

      COMMENTS RELATED TO THE NEED/MOTIVATION FOR THIS MODEL

      In a revision, I will clarify that the non-spatial MTL cell types that are routinely found in primate and human studies are fully compatible with this model. The reported simulations are focused on the specific question of how it can be that most mEC and hippocampal cell types in the rodent literature appear to be spatial. It is known that perirhinal cortex is not spatial. However, entorhinal cortex is the gateway to hippocampus. If the hippocampus has the capacity to represent non-spatial memories, it must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial.

      Lateral entorhinal cortex also projects to hippocampus, and one reviewer asks about the distinction between lateral versus medial entorhinal cortex. From this memory perspective, the important question is which part of the entorhinal cortex represents the non-spatial attributes common to the entire recording session, under the assumption that the animal is creating and retrieving memories during recording. If these non-spatial attributes are represented in lateral EC, there would be grid cells in lateral EC (but these are not found). There is evidence that lateral EC cells respond selectively in relation to objects (Deshmukh & Knierim, 2011), but in a typical rodent navigation study there are no objects in the enclosure.

      One reviewer asks whether this model is built to explain the existing data or whether the assumptions of this model are made for theoretical reasons. The BVC model (Barry et al., 2006), which is a precursor to this model, is a theoretically efficient representation of space that could support place coding. If the distances to different borders are known, it’s not clear why the MTL also needs the two-dimensional Fourier-like representation provided by grid cells. This gives rise to the proposal that grid cells with spatial receptive fields are serving some function other than representing space. In the proposed model, the precise hexagonal arrangement of grid cells indicates a property that is found everywhere in the enclosure (i.e., a “tiling” of knowledge for where the property can be found). This arrangement arises from the well-documented learning process termed “differentiation” in the memory literature (McClelland & Chappell, 1998; Norman & O’Reilly, 2003; Shiffrin & Steyvers, 1997), which highlights differences between memories to avoid interference and confusion.

      CONCERNS RELATED TO LIMITATIONS AND CONFLICTING RESULTS

      One reviewer points out that individual grid cells will typically reveal a grid pattern regardless of the environmental circumstances, which, according to this model, indicates that all such circumstances have the same non-spatial attribute. This might seem strange at first, but I suggest that there is a great deal of “sameness” to the environments used in the published rodent navigation experiments. For instance, as far as I’m aware, the animal is never allowed to interact with other animals during spatial navigation recording. Furthermore, the animal is always attached to wires during recording. The internal state of the animal (fear, aloneness, the noise of electronics, etc.) is likely similar across all recording situations and attributes of this internal state are likely represented in the hippocampus as well as in the regions that provide excitatory drive to hippocampus. The claim of this model is that the grid cells are “tagging” different navigation enclosures as places where these things happen (fear, aloneness, electronics, metal floor, no objects, etc.). The interesting question is what happens when the animal is allowed to navigate in a more naturalistic setting that includes varied objects, varied food sources, varied surfaces, other animals, etc. Do grid cells persist in such a naturalistic environment? Or do they lose their regularity, or even become silent, considering that there is no longer a uniformity to the non-spatial attributes? The results of Caswell Barry et al. (2012), demonstrate that the grid pattern expands and becomes less regular in a novel environment. Nevertheless, the novel environment in that study was uncluttered rather than naturalistic. It remains to be seen what will happen with a truly naturalistic environment.

      One reviewer asks how this model relates to non-grid multifield cells found in mEC (Diehl et al., 2017; see also the irregularly arranged 3D multifield cells reported by Ginosar et al., 2021). A full explanation of these cells would require a new simulation study. In a revision, I will discuss these cells, which reveal a consistent multifield spatial receptive field and yet the multiple fields are irregular in their arrangement rather than a precise hexagonal lattice. On this memory account, precise hexagonal arrangement of memories is something that occurs when there is a non-spatial attribute found throughout the enclosure. However, in a typical rodent navigation study, there may be some non-spatial attributes that are not found everywhere in the enclosure. For instance, consider the set of locations within the enclosure that afford a particular view of something outside of the enclosure or the set of locations corresponding to remembered episodic events (e.g., memory for the location where the animal first entered the enclosure). For non-spatial characteristics that are found in some locations but not others within in the enclosure, the cells representing those non-spatial attributes should reveal multifield firing at irregular locations, reflecting the subset of locations associated with the non-spatial attribute.

      One reviewer suggests that this model cannot explain the finding that grid fields become warped (e.g., grid fields arranged in an ellipse rather than a circle) in the same manner that the enclosure is warped when a wall is moved (Barry et al., 2007). The way in which I would simulate this result would be to assume that the change in the boundary location was too modest to be noticed by the animal. Because the distances are calculated relative to the borders, an unnoticed change in the border would not change the model in terms of the grid field as measured by proportional distances between borders. However, because the real-world Euclidean positions of the border are changed, the grid fields would be changed in terms of real-world coordinates. This is what I was referring to in the paper when I wrote “For instance, perhaps one egocentric/allocentric pair of mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders whereas a different egocentric/allocentric pair is based on head direction in remembered positions relative to landmarks exterior to the enclosure. This might explain why a deformation of the enclosure (moving in one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells. At the same time, if other grid modules are based on exterior properties (e.g., perhaps border cells in relation to the experimental room rather than the enclosure), then those grid modules would be unperturbed by moving the enclosure wall.” Related to the question of enclosure geometry, the irregularity that can emerge in trapezoid shaped enclosures was discussed in the section of the paper that reads “As seen in Figure 12, because all but one of the place cells was exterior when the simulated animal was constrained to a narrow passage, the hippocampal place cell memories were no longer arranged in a hexagonal grid. This disruption of the grid array for narrow passages might explain the finding that the grid pattern (of grid cells) is disrupted in the thin corner of a trapezoid (Krupic et al., 2015) and disrupted when a previously open enclosure is converted to a hairpin maze by insertion of additional walls within the enclosure (Derdikman et al., 2009).”

      CONCERNS THAT WILL BE ADDRESSED WITH GREATER CLARIFICATION

      One reviewer asks why a cell representing a non-spatial attribute found everywhere in the enclosure would not fire everywhere in the enclosure. In theory, cells could fire constantly. However, in practice, cells habituate and rapidly reduce their firing rate by an order of magnitude when their preferred stimulus is presented without cessation (Abbott et al., 1997; Tsodyks & Markram, 1997). After habituation, the firing rate of the cell fluctuates with minor variation in the strength of the excitatory drive. In other words, habituation allows the cell to become sensitive to changes in the excitatory drive (Huber & O’Reilly, 2003). Thus, if there is stronger top-down memory feedback in some locations as compared to others, the cell will fire at a higher rate in those remembered locations. In brief when faced with constant excitatory drive, the cell accommodates, and becomes sensitive to change in the magnitude of the excitatory drive.

      One reviewer asks for greater clarification regarding the simulation result of immediate stability for grid cells but not place cells. In a revision, I will provide a video showing a sped-up birds-eye view of the place cell memories for the 3D simulations that include head direction, showing the manner in which memories tend to linger in some locations more than others as they consolidate. This behavior was explained in the text that reads “Because the non-spatial cell’s grid field reflects on-average memory positions during the recording session (i.e., the locations where the non-spatial attribute is more often remembered, even if the locations of the memories are shifting), the grid fields for the non-spatial are immediately apparent, reflecting the tendency of place cells to linger in some locations as compared to other locations during consolidation. More specifically, the place cells tend to linger at the peaks and troughs of the border cell tuning functions (see the explanation above regarding the tendency of the grid to align with border cell dimensions). By analogy, imagine a time-lapsed birds-eye view of cars traversing the city-block structure of a densely populated city; this on-average view would show a higher density of cars at the cross-street junctions owing to their tendency to become temporarily stuck at stoplights. However, with additional learning and consolidation, the place cells stabilize their positions (e.g., the cars stop traveling), producing a consistent grid field for the head direction conjunctive grid cells.” The text describing why some locations are more “sticky” than others reads “Additional analyses revealed that this tendency to align with border cell dimensions is caused by weight normalization (Step 6 in the pseudocode). Specifically, connection weights cannot be updated above their maximum nor below their minimum allowed values. This results in a slight tendency for consolidated place cell memories to settle at one of the three peak values or three trough values of the sine wave basis set. This “stickiness” at one of 6 peak or trough values for each basis set is very slight and only occurred after many consolidation steps. In terms of biological systems, there is an obvious lower-bound for excitatory connections (i.e., it is not possible to have an excitatory weight connection that is less than zero), but it is not clear if there is an upper-bound. Nevertheless, it is common practice with deep learning models include an upper-bound for connection weights because this reduces overfitting (Srivastava et al., 2014) and there may be similar pressures for biological systems to avoid excessively strong connections.”

      One reviewer points out that Border cells are not typically active in the center of enclosure. However, the model can be built without assuming between-border cells (early simulations with the model did not make this assumption). Regarding this issue, the text reads “Unlike the BVC model, the boundary cell representation is sparsely populated using a basis set of three cells for each of the three dimensions (i.e., 9 cells in total), such that for each of the three non-orthogonal orientations, one cell captures one border, another the opposite border, and the third cell captures positions between the opposing borders (Solstad et al., 2008). However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).” The Solstad paper found a few cells that responded in positions between borders, but perhaps not as many as 1 out of 3 cells, such as this particular model simulation predicts. If the paucity of between-border cells is a crucial data point, the model can be reconfigured with opponent-border cells without any between border cells. The reason that 3 border cells were used rather than 2 opponent border cells was for simplicity. Because 3 head direction cells were used to capture the face-centered cubic packing of memories, the simulation also used 3 border cells per dimensions to allow a common linear sum metric when conjoining dimensions to form memories. If the border dimensions used 2 cells while head direction used 3 cells, a dimensional weighting scheme would be needed to allow this mixing of “apples and oranges” in terms of distances in the 3D space that includes head direction.

      REFERENCES Abbott, L. F., Varela, J. A., Sen, K., & Nelson, S. B. (1997). Synaptic depression and cortical gain control. Science, 275(5297), 220–224.

      Barry, C., Ginzberg, L. L., O’Keefe, J., & Burgess, N. (2012). Grid cell firing patterns signal environmental novelty by expansion. Proceedings of the National Academy of Sciences of the United States of America, 109(43), 17687–17692. https://doi.org/DOI 10.1073/pnas.1209918109

      Barry, C., Hayman, R., Burgess, N., & Jeffery, K. J. (2007). Experience-dependent rescaling of entorhinal grids. Nature Neuroscience, 10(6), 682–684.

      Barry, C., Lever, C., Hayman, R., Hartley, T., Burton, S., O’Keefe, J., Jeffery, K., & Burgess, Ν. (2006). The boundary vector cell model of place cell firing and spatial memory. Reviews in the Neurosciences, 17(1–2), 71–98.

      Derdikman, D., Whitlock, J. R., Tsao, A., Fyhn, M., Hafting, T., Moser, M. B., & Moser, E. I. (2009). Fragmentation of grid cell maps in a multicompartment environment. Nat Neurosci, 12(10), 1325-U155. https://doi.org/Doi 10.1038/Nn.2396

      Deshmukh, S. S., & Knierim, J. J. (2011). Representation of non-spatial and spatial information in the lateral entorhinal cortex. Frontiers in Behavioral Neuroscience, 5, 69.

      Diehl, G. W., Hon, O. J., Leutgeb, S., & Leutgeb, J. K. (2017). Grid and nongrid cells in medial entorhinal cortex represent spatial location and environmental features with complementary coding schemes. Neuron, 94(1), 83-92. e6.

      Ginosar, G., Aljadeff, J., Burak, Y., Sompolinsky, H., Las, L., & Ulanovsky, N. (2021). Locally ordered representation of 3D space in the entorhinal cortex. Nature, 596(7872), 404–409.

      Huber, D. E., & O’Reilly, R. C. (2003). Persistence and accommodation in short-term priming and other perceptual paradigms: Temporal segregation through synaptic depression. Cognitive Science, 27(3), 403–430. https://doi.org/10.1207/s15516709cog2703_4

      Krupic, J., Bauza, M., Burton, S., Barry, C., & O’Keefe, J. (2015). Grid cell symmetry is shaped by environmental geometry. Nature, 518(7538), 232–235.

      McClelland, J. L., & Chappell, M. (1998). Familiarity breeds differentiation: A subjective-likelihood approach to the effects of experience in recognition memory. Psychological Review, 105(4), 724–760.

      Mok, R. M., & Love, B. C. (2019). A non-spatial account of place and grid cells based on clustering models of concept learning. Nature Communications, 10(1), 5685.

      Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementary-learning-systems approach. Psychological Review, 110(4), 611–646.

      Rodríguez‐Domínguez, U., & Caplan, J. B. (2019). A hexagonal Fourier model of grid cells. Hippocampus, 29(1), 37–45.

      Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM - retrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145–166.

      Solstad, T., Boccara, C. N., Kropff, E., Moser, M. B., & Moser, E. I. (2008). Representation of Geometric Borders in the Entorhinal Cortex. Science, 322(5909), 1865–1868. https://doi.org/DOI 10.1126/science.1166466

      Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

      Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. (2017). The hippocampus as a predictive map. Nature Neuroscience, 20(11), 1643–1653.

      Tsodyks, M. V., & Markram, H. (1997). The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc Natl Acad Sci U S A, 94(2), 719–723. https://doi.org/10.1073/pnas.94.2.719

      Wei, X.-X., Prentice, J., & Balasubramanian, V. (2015). A principle of economy predicts the functional architecture of grid cells. Elife, 4, e08362.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. We have records of the fast-z correction applied by the ScanImage on microscope during acquisition, so we have supplied the online fast-z motion correction .csv files for two example sessions on our GitHub page as supplementary files:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      These files correspond to Figure S3b (2367_200214_E210_1) and to Figures 5 and 6 (3056_200924_E235_1). These are now also referenced in the main text. See lines ~595, pg 18 and lines ~762, pg 24.

      We have also made minor revisions to the main text of the manuscript with clear descriptions of methods that we have found important for the minimization of movement artifacts, such as fully tightening all mounting devices, implanting the cranial window with proper, evenly applied pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel. See Line ~309, pg 10.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We have opted to not change the title of the paper, because we feel that adding the qualifier, “in two preparations,” would add unnecessary complexity. In addition, while the dorsal mount preparation allows for imaging of bilateral dorsal cortex, the side mount preparation does indeed allow for imaging of both dorsal and lateral cortex across the right hemisphere (a bit of contralateral dorsal cortex is also imageable), and the design can be easily “flipped” across a mirror-plane to allow for imaging of left dorsal and lateral cortex. Taken together, we do show preparations that allow for pan-cortical 2-photon imaging.

      We do agree that imprecise reference to the two preparations can sometimes lead to confusion. Therefore, we made several small revisions to the manuscript, including at ~line 545, to make it clearer that we used two imaging preparations to generate our combined 2-photon mesoscope dataset, and that each of those two preparations had both benefits and limitations.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: The preparation by Esmaeili et al. 2021 has some similarities to, but also differences from, our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries for our side mount preparation, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We have compared these preparations more thoroughly in the revised manuscript. (See lines ~278.)

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion of areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons per session. We now mention these various factors and have made clear that we were not, for the purposes of this paper, trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      We refer to these issues now briefly in the main text. (See ~line 93, pg 3).

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript at ~line 235, pg. 7.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it is possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use ultrasound gel instead (which we found to be, to some degree, optically inferior to water), but without the horizontal light shield, light from the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult under these conditions because the camera would need the same optical access angle as the 2-photon objective, or would need to be moved downward toward the air table and rotated up at an angle of 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance.

      The B-SOiD analysis that we show in Figure 6 is based on a model trained on 80% of the data from four sessions taken from the same mouse, and then tested on all of a single session from that mouse. Initial attempts to train across sessions from different mice were unsuccessful, probably due to differences in behavioral repertoires across mice. However, we have performed extensive tests with B-SOiD and are confident that these sorts of results are reproducible across mice, although we are not prepared to publish these results at this time.

      We now clarify these points in the main text at ~line 865, pg 27.

      An additional comparison of the results of B-SOiD trained on different numbers of sessions to that of keypoint-MOSEQ (Weinreb et al, 2023, bioRxiv) trained on ~20 sessions can now be found as supplementary material on our GitHub site:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/Figure_SZZ_BSOID_MOSEQ_align.pdf

      The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: After re-examination of the original analysis output files, we have indeed discovered that some of the Rastermap neuron density maps in Figure 6e were incorrectly aligned with their respective qualitative behaviors due to a discrepancy in file numbering between the images in 6e and the ensembles identified in 6c (each time that Rastermap is run on the same data, at least with the older version available at the time of creation of these figures, the order of the ensembles on the y-axis changes and thus the numbering of the ensembles would change even though the neuron identities within each group stayed the same for a given set of parameters).

      This unfortunate panel alignment / graphical display error present in the original reviewed preprint has been fixed in the current, updated figure (i.e. twitch corresponds to Rastermap groups 2 and 3, whisk to group 6, walk to groups 5 and 4, and oscillate to groups 0 and 1), and in the main text at ~line 925, pg 29. We have also changed the figure legend, which also contained accurate but misaligned information, for Figure 6e to reflect this correction.

      One can now see that, because the data from both figures is from the same session in the same mouse, as you correctly point out, Fig 5d left (walk and whisk) corresponds roughly to Fig 6e group R7, “walk”, and that Fig 5d right (whisk) corresponds roughly to Fig 6e group R4, “twitch”.

      We have double-checked the identity of other CCF map displays of Rastermap neuron density and of mean correlations between neural activity and behavioral primitives in all other figures, and we found no other such alignment or mis-labeling errors.

      We have also added a caveat in the main text at ~lines 925-940, pg. 30, pointing out the preliminary nature of these findings, which are shown here as an example of the viability of the methods. Analysis of the variability of Rastermap alignments across sessions is beyond the scope of the current paper, although it is an issue that we hope to address in upcoming analysis papers.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We added a brief explanation of Suite2p motion correction at ~line 136, pg 4. We have also added additional details concerning CCF / MMM alignment and other analysis issues. In general we cite other papers where possible to avoid repeating details of analysis methods that are already published.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including those for multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We have edited the motivations behind the study to clarify the general problems that are being addressed. However, as the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details necessarily deal specifically with this system.

      We briefly compare the methods and results from our Thorlabs system to that of Diesel-2p, another comparable system, based on what we have been able to glean from the literature on its strengths and weaknesses. See ~lines 206-213, pg 6.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with in-depth analysis papers that are currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (d). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      We now reference this figure on ~lines 190-192, pg 6 of the main text, near the beginning of the Results section.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453:

      “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks”

      ,we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For the claim stated on line 463:

      “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”

      ,we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107).

      We have included these two new references in the new, revised version of our paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: It would be useful if we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group within each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons, or ~3.3% density, per CCF area.) Our current figure legend states the maximums of the scale bar look-up values (reds) for each group, which range from ~8% to 32%.

      However, because the data in panel 6e are from a single session and are being provided as an example of our methods and not for the purpose of claiming a specific result at this point, we choose not to report statistics. It is worth pointing out, perhaps, that Rastermap group densities for a given CCF area close to 3.3% are likely not different from chance, and those closer to ~40%, which is our highest density (for area M2 in Rastermap group 7, which corresponds to the qualitative behavior “walk”), are most likely not due to chance. Without analysis of multiple sessions from the same mouse we believe that making a clear statement of significance for this likelihood would be premature.

      We now clarify this decision and related considerations in the main text at ~line 920, pg 29.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript.

      Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (c). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data that one can expect to obtain using our methods. We will provide a more complete analysis of data obtained using our methodology in the near future in another manuscript.

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare 1-photon widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli (i.e. “passive sensory stimulation”), while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We have corrected these statements and incorporated these and other relevant references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, and speed. We will reference the papers you mention without an extensive literature review, but we would like to emphasize the following points:

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity related fluorescence from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      We have further clarified our discussion of these issues in the main text at ~lines 76-80, pg 2.

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we now include brief supplementary material demonstrating the changes in the window preparation that we observed over the prolonged time periods of our study, for both the dorsal and side mount preparations. The following link to this material is now referenced at ~line 287, pg 9, and at the end of Fig S1:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      We have also included brief additional details in the main text that we found were useful for facilitating long term use of these preparations. These are located at ~line 287-290, pg 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sharing raw data and code:

      I strongly encourage sharing some of the raw data from your experiments and all the code used for data analysis (e.g. in a github repository). This would help the reader evaluate data quality, and reproduce your results.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      Our existing GitHub repository, already referenced in the paper, is located here:

      https://github.com/vickerse1/mesoscope_spontaneous

      We have added an additional reference in the main text to the existence of these publicly available resources, including the appropriate links, located at ~lines 190-200, pg 6.

      (2) Use of proprietary software:

      The reliance on proprietary tools like LabView and Matlab could be a limitation for some researchers, given the associated costs and accessibility issues. If possible, consider incorporating or suggesting alternatives that are open-source, to make your methodology more accessible to a broader range of researchers, including those with limited resources.

      Authors’ Response: We are reluctant to recommend open source software that we have not thoroughly tested ourselves. However, we will mention, when appropriate, possible options for the reader to consider.

      Although LabView is proprietary and can be difficult to code, it is particularly useful when used in combination with National Instruments hardware. ScanImage in use with the Thorlabs mesoscope uses National Instruments hardware, and it is convenient to maintain hardware standards across the integrated rig/experimental system. Labview is also useful because it comes with a huge library of device drivers that makes addition of new hardware from basically any source very convenient.

      That being said, there are open source alternatives that could conceivably be used to replace parts of our system. One example is AutoPilot (author: Jonny Saunders), for control of behavioral data acquisition: https://open-neuroscience.com/post/autopilot/.

      We are not aware of an alternative to Matlab for control of ScanImage, which is the supported control software for the ThorLabs 2-photon mesoscope.

      Most of our processing and analysis code (see GitHub page: https://github.com/vickerse1/mesoscope_spontaneous) is in Python, but some of the code that we currently use remains in Matlab form. Certainly, this could be re-written as Python code. However, we feel like this is outside the scope of the current paper. We have provided commenting to all code in an attempt to aid users in translating it to other languages, if they so desire.

      (3) Quantifying the effect of tilted head:

      To address the potential impact of tilting the mouse's head on your findings, a quantitative analysis of any systematic differences in the behavior (e.g. Bsoid motifs) could be illuminating.

      Authors’ Response: We have performed DeepLabCut analysis of all sessions from both preparations, across several iterations with different parameters, to extract pose estimates, and we have also performed BSOiD of these sessions. We did not find any obvious qualitative differences in the number of behavioral motifs identified, the dwell times of these motifs, and similar issues, relating to the issue of tilting of the mouse’s head in the side mount preparation. We also did not find any obvious differences in the relative frequencies of high level qualitative behaviors, such as the ones referred to in Fig. 6, between the two preparations.

      Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      (4) Clarification in the discussion section:

      The paragraph titled "Advantages and disadvantages of our approach" seems to diverge into discussing future directions, rather than focusing on the intended topic. I suggest revisiting this section to ensure that it accurately reflects the strengths and limitations of your approach.

      Authors’ Response: We agree with the reviewer that this section included several potential next steps or solutions for each advantage and disadvantage, which the reviewer refers to as “future directions” and are thus arguably beyond the scope of this section. Therefore we have retitled this section as, “Advantages and disadvantages of our approach (with potential solutions):”.

      Although we believe this to be a logical organization, and we already include a section focused purely on future directions in the Discussion section, we have refocused each paragraph of the advantages/disadvantages subsection to concentrate on the advantages and disadvantages per se. In addition, we have made minor changes to the “future directions” section to make it more succinct and practical. These changes can be found at lines ~1016-1077, pg 33-34.

      Reviewer #2 (Recommendations For The Authors):

      Below are some more detailed points that will hopefully help to further improve the quality and scope of the manuscript.

      • While it is certainly favorable for many questions to measure large-scale activity from many brain regions, the introduction appears to suggest that this is a prerequisite to understanding multimodal decision-making. This is based on the argument that combining multiple recordings with movement indicators will 'necessarily obscure the true spatial correlation structures'. However, I don't understand why this is the case or what is meant by 'true spatial correlation structures'. Aren't there many earlier studies that provided important insights from individual cortical areas? It would be helpful to improve the writing to make this argument clearer.

      Authors’ Response: The reviewer makes an excellent point and we have re-worded the manuscript appropriately, to reflect the following clarifications. These changes can be found at ~lines 58-71, pg. 2.

      We believe you are referring to the following passage from the introduction:

      “Furthermore, the arousal dependence of membrane potential across cortical areas has been shown to be diverse and predictable by a temporally filtered readout of pupil diameter and walking speed (Shimoaka et al, 2018). This makes simultaneous recording of multiple cortical areas essential for comparison of the dependence of their neural activity on arousal/movement, because combining multiple recording sessions with pupil dilations and walking bouts of different durations will necessarily obscure the true spatial correlation structures.”

      Here, we do not mean to imply that earlier studies of individual cortical areas are of no value. This argument is provided as an example, of which there are others, of the idea that, for sequences or distributed encoding schemes that simultaneously span many cortical areas that are too far apart to be simultaneously imaged under conventional 2-photon imaging, or are too sparse to be discovered with 1-photon widefield imaging, there are some advantages of our new methods over conventional imaging methods that will allow for truly novel scientific analyses and insights.

      The general idea of the present example, based on the findings of Shimoaka et al, 2018, is that it is not possible to directly combine and/or compare the correlations between behavior and neural activity across regions that were imaged in separate sessions, because the correlations between behavior and neural activity in each region appear to depend on the exact time since the behavior began (Shimoaka et al, 2018), in a manner that differs across regions. So, for example, if one were to record from visual cortex in one session with mostly brief walk bouts, and then from somatosensory cortex in a second session with mostly long walk bouts, any inferred difference between the encoding of walk speed in neural activity between the two areas would run the risk of being contaminated by the “temporal filtering” effect shown in Shimoaka et al, 2018. However, this would not be the case in our recordings, because the distribution of behavior durations corresponding to our recorded neural activity across areas will be exactly the same, because they were recorded simultaneously.

      • The text describes different timescales of neural activity but is an imaging rate of 3 Hz fast enough to be seen as operating at the temporal dynamics of the behavior? It appears to me that the sampling rate will impose a hard limit on the speed of correlations that can be observed across regions. While this might be appropriate for relatively slow behaviors and spontaneous fluctuations in arousal, sensory processing and decision formation likely operate on faster time scales below 100ms which would even be problematic at 10 Hz which is proposed as the ideal imaging speed in the manuscript.

      Authors’ Response: Imaging rate is always a concern and the limitations of this have been discussed in other manuscripts. We will remind the reader of these limitations, which must always be kept in mind when interpreting fluorescence based neural activity data.

      Previous studies imaging on a comparable yet more limited spatial scale (Stringer et al, 2019) used an imaging speed of ~1 Hz. With this in view, our work represents an advance both in spatial extent of imaged cortex and in imaging speed. Specifically, we believe that ~1 Hz imaging may be sufficient to capture flip/flop type transitions between low and high arousal states that persist in general for seconds to tens of seconds, and that ~3-5 Hz imaging likely provides additional information about encoding of spontaneous movements and behavioral syllables/motifs.

      Indeed, even 10 Hz imaging would not be fast enough to capture the detailed dynamics of sensory processing and decision formation, although these speeds are likely sufficient to capture “stable” encodings of sensory representations and decisions that must be maintained during a task, for example with delayed match-to-sample tasks.

      In general we are further developing our preparations to allow us to perform simultaneous widefield imaging and Neuropixels recordings, and to perform simultaneous 1.2 x 1.2 mm 2-photon imaging and visually guided patch clamp recordings.

      Both of these techniques will allow us to combine information across both the slow and fast timescales that you refer to in your question.

      We have clarified these points in the Introduction and Discussion sections, at ~lines ~93-105, pg 3, and ~lines 979-983, pg 31 and ~lines 1039-1045, pg 33, respectively.

      • The dorsal mount is very close to the crystal skull paper and it was ultimately not clear to me if there are still important differences aside from the headbar design that a reader should be aware of. If they exist, it would be helpful to make these distinctions a bit clearer. Also, the sea shell implants from Ghanbari et al in 2019 would be an important additional reference here.

      Authors’ Response: We have added brief references to these issues in our revised manuscript at ~lines 89-97, pg 3:

      Although our dorsal mount preparation is based on the “crystal skull paper” (Kim et al, 2016), which we reference, the addition of a novel 3-D printable titanium headpost, support arms, light shields, and modifications to the surgical protocols and CCF alignment represent significant advances that made this preparation useable for pan-cortical imaging using the Thorlabs mesoscope. In fact, we were in direct communication with Cris Niell, a UO professor and co-author on the original Kim et al, 2016 paper, during the initial development of our preparation, and he and members of his lab consulted with us in an ongoing manner to learn from our successful headpost and other hardware developments. Furthermore, all of our innovations for data acquisition, imaging, and analysis apply equally to both our dorsal mount and side mount preparations.

      Thank you for mentioning the Ghanbari et al, 2019 paper on the transparent polymer skull method, “See Shells.” We were in fact not aware of this study. However, it should be noted that their preparation seems to, like the crystal skull preparation and our dorsal mount preparation, be limited to bilateral dorsal cortex and not to include, as does our cranial window side mount preparation and the through-the-skull widefield preparation of Esmaeili et al, 2021, a fuller range of lateral cortical areas, including primary auditory cortex.

      • When using the lateral mount, rotating the objective, rather than the animal, appears to be preferable to reduce the stress on the animal. I also worry that the rather severe head tilt could be an issue when training animals in more complex behaviors and would introduce an asymmetry between the hemispheres due to the tilted body position. Is there a strong reason why the authors used water instead of an imaging gel to resolve the issue with the meniscus?

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      • In parts, the description of the methods is very specific to the Thorlabs mesoscope which makes it harder to understand the general design choices and challenges for readers that are unfamiliar with that system. Since the Mesoscope is very expensive and therefore unavailable to many labs in the field, I think it would increase the reach of the manuscript to adjust the writing to be less specific for that system but instead provide general guidance that could also be helpful for other systems. For example (but not exclusively) lines 231-234 or lines 371 and below are very Thorlabs-specific.

      Authors’ Response: We have revised the manuscript so that it is more generally applicable to mesoscopic methods.

      We will make revisions as you suggest where possible, although we have limited experience with the other imaging systems that we believe you are referring to. However, please note that we already mentioned at least one other comparable system in the original eLife reviewed pre-print (Diesel 2p, line 209; Yu and Smith, 2021).

      Here are a couple of examples of how we have broadened our description:

      (1) On lines ~231-234, pg 7, we write:

      “However, if needed, the objective of the Thorlabs mesoscope may be rotated laterally up to +20 degrees for direct access to more ventral cortical areas, for example if one wants to use a smaller, flat cortical window that requires the objective to be positioned orthogonally to the target region.”

      Here have modified this to indicate that one may in general rotate their objective lens if their system allows it. Some systems, such as the Thorlabs Bergamo microscope and the Sutter MOM system, allow more than 20 degrees of rotation.

      (2) On line ~371, pg 11, we write:

      “This technique required several modifications of the auxiliary light-paths of the Thorlabs mesoscope”

      Here, we have changed the writing to be more general such as “may require…of one’s microscope.”

      Thank you for these valuable suggestions.

      • Lines 287-299: Could the authors quantify the variation in imaging depth, for example by quantifying to which extent the imaging depth has to be adjusted to obtain the position of the cortical surface across cortical areas? Given that curvature is a significant challenge in this preparation this would be useful information and could either show that this issue is largely resolved or to what extent it might still be a concern for the interpretation of the obtained results. How large were the required nominal corrections across imaging sites?

      Authors’ Response: This information was provided previously (lines 297-299):

      “In cases where we imaged multiple small ROIs, nominal imaging depth was adjusted in an attempt to maintain a constant relative cortical layer depth (i.e. depth below the pial surface; ~200 micrometer offset due to brain curvature over 2.5 mm of mediolateral distance, symmetric across the center axis of the window).”

      This statement is based on a qualitative assessment of cortical depth based on neuron size and shape, the density of neurons in a given volume of cortex, the size and shape of blood vessels, and known cortical layer depths across regions. A ground-truth measurement of this depth error is beyond the scope of the present study. However, we do specify the type of glass, thickness, and curvature that we use, and the field curvature characterization of the Thorlabs mesoscope is given in Fig. 6 of the Sofroniew et al, 2016 eLife paper.

      In addition, we have provided some documentation of online fast-z correction parameters on our GitHub page at:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      ,and some additional relevant documentation can be found in our publicly available data repository on FigShare+ at: https://doi.org/10.25452/figshare.plus.c.7052513

      • Given the size of the implant and the subsequent work attachments, I wonder to which extent the field of view of the animal is obstructed. Did the authors perform receptive field mapping or some other technique that can estimate the size of the animals' remaining field of view?

      Authors’ Response: The left eye is pointed down ~22.5 degrees, but we position the mouse near the left edge of the wheel to minimize the degree to which this limits their field of view. One may view our Fig. 1 and Suppl Movies 1 and 6 to see that the eyes on the left and right sides are unobstructed by the headpost, light shields, and support arms. However, other components of the experimental setup, such as the speaker, cameras, etc. can restrict a few small portions of the visual field, depending on their exact positioning.

      The facts that mice responded to left side visual stimuli in preliminary recordings during our multimodal 2-AFC task, and that the unobstructed left and right camera views, along with pupillometry recordings, showed that a significant portion of the mouse’s field of view, from either side, remains intact in our preparation.

      We have clarified these points in the text at ~lines 344-346, pg. 11.

      • Line 361: What does movie S7 show in this context? The movie seems to emphasize that the observed calcium dynamics are not driven by movement dynamics but it is not clear to me how this relates to the stimulation of PV neurons. The neural dynamics in the example cell are also not very clear. It would be helpful if this paragraph would contain some introduction/motivation for the optogenetic stimulation as it comes a bit out of the blue.

      Authors’ Response: This result was presented for two reasons.

      First, we showed it as a control for movement artifacts, since inhibition of neural activity enhances the relative prominence of non-activity dependent fluorescence that is used to examine the amplitude of movement-related changes in non-activity dependent fluorescence (e.g. movement artifacts). We have included a reference to this point at ~lines 587-588, pg 18.

      Second, we showed it as a demonstration of how one may combine optogenetics with imaging in mesoscopic 2-P imaging. References to this point were already present in the original version of the manuscript (the eLife “ reviewed preprint”).

      • Lines 362-370: This paragraph and some of the following text are quite technical and would benefit from a better description and motivation of the general workflow. I have trouble following what exactly is done here. Are the authors using an online method to identify the CCF location of the 2p imaging based on the vessel pattern? Why is it important to do this during the experiment? Wouldn't it be sufficient to identify the areas of interest based on the vessel pattern beforehand and then adjust the 2p acquisition accordingly? Why are they using a dial, shutter, and foot pedal and how does this relate to the working distance of the objective? Does the 'standardized cortical map' refer to the Allen common coordinate framework?

      Authors’ Response: We have revised this section to make it more clear.

      Currently, the general introduction to this section appears in lines 349-361. Starting in line 362, we currently present the technical considerations needed to implement the overall goals stated in that first paragraph of this section.

      In general we use a post-hoc analysis step to confirm the location of neurons recorded with 2-photon imaging. We use “online” juxtaposition of the multimodal map image with overlaid CCF with the 2-photon image by opening these two images next to each other on the ScanImage computer and matching the vasculature patterns “by eye”. We have made this more clear in the text so that the interested reader can more readily implement our methods.

      By use of the phrase “standardized cortical map” in this context, we meant to point out that we had not decided a priori to use the Allen CCF v3.0 when we started working on these issues.

      • Does Fig. 2c show an example of the online alignment between widefield and 2p data? I was confused here since the use of suite2p suggests that this was done post-recording. I generally didn't understand why the user needed to switch back and forth between the two modes. Doesn't the 2p image show the vessels already? Also, why was an additional motorized dichroic to switch between widefield and 2p view needed? Isn't this the standard in most microscopes (including the Thorlabs scopes)?

      Authors’ Response: We have explained this methodology more clearly in the revised manuscript, both at ~lines 485-500, pg 15-16, and ~lines 534-540, pg 17.

      The motorized dichroic we used replaced the motorized mirror that comes with the Thorlabs mesoscope. We switched to a dichroic to allow for near-simultaneous optogenetic stimulation with 470 nm blue light and 2-photon imaging, so that we would not have to move the mirror back and forth during live data acquisition (it takes a few seconds and makes an audible noise that we wanted to avoid).

      Figure 2c shows an overview of our two step “offline” alignment process. The image at the right in the bottom row labeled “2” is a map of recorded neurons from suite2p, determined post-hoc or after imaging. In Fig. 2d we show what the CCF map looks like when it’s overlaid on the neurons from a single suite2p session, using our alignment techniques. Indeed, this image is created post-hoc and not during imaging. In practice, “online” during imaging, we would have the image at left in the bottom row of Fig. 2c (i.e. the multimodal map image overlaid onto an image of the vasculature also acquired on the widefield rig, with the 22.5 degree rotated CCF map aligned to it based on the location of sensory responses) rotated 90 degrees to the left and flipped over a horizontal mirror plane so that its alignment matches that of the “online” 2-photon acquisition image and is zoomed to the same scale factor. Then, we would navigate based on vasculature patterns “by-eye” to the desired CCF areas, and confirm our successful 2-photon targeting of predetermined regions with our post-hoc analysis.

      • Why is the widefield imaging done through the skull under anesthesia? Would it not be easier to image through the final window when mice have recovered? Is the mapping needed for accurate window placement?

      Authors’ Response: The headpost and window surgeries are done 3-7 days apart to increase success rate and modularize the workflow. Multimodal mapping by widefield imaging is done through the skull between these two surgeries for two major reasons. First, to make efficient use of the time between surgeries. Second, to allow us to compare the multimodal maps to skull landmarks, such as bregma and lambda, for improved alignment to the CCF.

      Anesthesia was applied to prevent state changes and movements of the mouse, which can produce large, undesired effects on neural responses in primary sensory cortices in the context of these mapping experiments. We sometimes re-imaged multimodal maps on the widefield microscope through the window, roughly every 30-60 days or whenever/if significant changes in vasculature pattern became apparent.

      We have clarified these points in the main text at ~lines 510-522, pg 20-21, and we added a link to our new supplementary material documenting the changes observed in the window preparation over time:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      Thank you for these questions.

      • Lines 445 and below: Reducing the noise from resonant scanners is also very relevant for many other 2p experiments so it would be helpful to provide more general guidance on how to resolve this problem. Is the provided solution only applicable to the Thorlabs mesoscope? How hard would it be to adjust the authors' noise shield to other microscopes? I generally did not find many additional details on the Github repo and think readers would benefit from a more general explanation here.

      Authors’ Response: Our revised Github repository has been modified to include more details, including both diagrams and text descriptions of the sound baffle, respectively:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_for_noise_reduction_on_resonant_scanner_devices.pdf

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      However, we can not presently disclose our confidential provisional patent application. Complete design information will likely be available in early 2025 when our full utility patent application is filed.

      With respect to your question, yes, this technique is adaptable to any resonant scanner, or, for that matter, any complicated 3D surface that emits sound. We first 3D scan the surface, and then we reverse engineer a solid that fully encapsulates the surface and can be easily assembled in parts with bolts and interior foam that allow for a tight fit, in order to nearly completely block all emitted sound.

      It is this adaptability that has prompted us to apply for a full patent, as we believe this technique will be quite valuable as it may apply to a potentially large number of applications, starting with 2-photon resonant scanners but possibly moving on to other devices that emit unwanted sound.

      • Does line 458 suggest that the authors had to perform a 3D scan of the components to create the noise reduction shield? If so, how was this done? I don't understand the connection between 3D scanning and printing that is mentioned in lines 464-466.

      Authors’ Response: We do not want to release full details of the methodology until the full utility patent application has been submitted. However, we have now included a simplified text description of the process on our GitHub page and included a corresponding link in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      We also clarified in the main text, at the location that you indicate, why the 3D scanning is a critical part of our novel 3D-design, printing, and assembly protocol.

      • Lines 468 and below: Why is it important to align single-cell data to cortical areas 'directly on the 2-photon microscope'? Is this different from the alignment discussed in the paragraph above? Why not focus on data interpretation after data acquisition? I understand the need to align neural data to cortical areas in general, I'm just confused about the 'on the fly' aspect here and why it seems to be broken out into two separate paragraphs. It seems as if the text in line 485 and below could also be placed earlier in the text to improve clarity.

      Authors’ Response: Here by “such mapping is not routinely possible directly on the 2-photon mesoscope” what we mean is that it is not possible to do multimodal mapping directly on the mesoscope - it needs to be done on the widefield imaging rig (a separate microscope). Then, the CCF is mapped onto the widefield multimodal map, which is overlaid on an image of the vasculature (and sometimes also the skull) that was also acquired on the widefield imaging rig, and the vasculature is used as a sort of Rosetta Stone to co-align the 2-photon image to the multimodal map and then, by a sort of commutative property of alignment, to the CCF, so that each individual neuron in the 2-photon image can be assigned a unique CCF area name and numerical identifier for subsequent analysis.

      We have clarified this in the text, thank you.

      The Python code for aligning the widefield and 2-photon vessel images would also be of great value for regular 2p users. It would strongly improve the impact of the paper if the repository were better documented and the code would be equally applicable for alignment of imaging data with smaller cranial windows.

      Authors’ Response: All of the code for multimodal map, CCF, and 2-photon image alignment is, in fact, already present on the GitHub page. We have made some minor improvements to the documentation, and readers are more than welcome to contact us for additional help.

      Specifically, the alignment you refer to starts in cell #32 of the meso_pre_proc_1.ipynb notebook. In general the notebooks are meant to be run sequentially, starting with cell #1 of meso_pre_proc_1, then going to the next cell etc…, then moving to meso_pre_proc_2, etc… The purpose of each cell is labeled at the top of the cell in a comment.

      We now include a cleaned, abridged version of the meso_pre_proc_1.pynb notebook that contains only the steps needed for alignment, and included a direct link to this notebook in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/python_code/mesoscope_preprocess_MMM_creation.ipynb

      Rotated CCF maps are in the CCF map rotation folder, in subfolders corresponding to the angle of rotation.

      Multimodal map creation involves use of the SensoryMapping_Vickers_Jun2520.m script in the Matlab folder.

      We updated the main text to clarify these points and included direct links to scripts relevant to each processing step.

      • Figure 4a: I found it hard to see much of the structure in the Rastermap projection with the viridis colormap - perhaps also because of a red-green color vision impairment. Correspondingly, I had trouble seeing some of the structure that is described in the text or clearer differences between the neuron sortings to PC1 and PC2. Is the point of these panels to show that both PCs identify movement-aligned dynamics or is the argument that they isolate different movement-related response patterns? Using a grayscale colormap as used by Stringer et al might help to see more of the many fine details in the data.

      Authors’ Response: In Fig. 4a the viridis color range is from blue to green to yellow, as indicated in the horizontal scale bar at bottom right. There is no red color in these Rastermap projections, or in any others in this paper. Furthermore, the expanded Rastermap insets in Figs. S4 and S5 provide additional detailed information that may not be clear in Fig 4a and Fig 5a.

      We prefer, therefore, not to change these colormaps, which we use throughout the paper.

      We have provided grayscale png versions of all figures on our GitHub page:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/grayscale_figures

      In Fig 4a the point of showing both the PC1 and PC2 panels is to demonstrate that they appear to correspond to different aspects of movement (PC1 more to transient walking, both ON and OFF, and PC2 to whisking and sustained ON walk/whisk), and to exhibit differential ability to identify neurons with positive and negative correlations to arousal (PC1 finds both, both PC2 seems to find only the ON neurons).

      We now clarify this in the text at ~lines 696-710, pg 22.

      • I find panel 6a a bit too hard to read because the identification and interpretation of the different motifs in the different qualitative episodes is challenging. For example, the text mentions flickering into motif 13 during walk but the majority of that sequence appears to be shaped by what I believe to be motif 11. Motif 11 also occurs prominently in the oscillate state and the unnamed sequence on the left. Is this meaningful or is the emphasis here on times of change between behavioral motifs? The concept of motif flickering should be better explained here.

      Authors’ Response: Here motif 13 corresponds to a syllable that might best be termed “symmetric and ready stance”. This tends to occur just before and after walking, but also during rhythmic wheel balancing movements that appear during the “oscillate” behavior.

      The intent of Fig. 6a is to show that each qualitatively identified behavior (twitch, whisk, walk, and oscillate) corresponds to a period during which a subset of BSOiD motifs flicker back and forth, and that the identity of motifs in this subset differs across the identified qualitative behaviors. This is not to say that a particular motif occurs only during a single identified qualitative behavior. Admittedly, the identification of these qualitative behaviors is a bit arbitrary - future versions of BSOiD (e.g. ASOiD) in fact combine supervised (i.e. arbitrary, top down) and unsupervised (i.e. algorithmic, objective, bottom-up) methods of behavior segmentation in attempt to more reliably identify and label behaviors.

      Flickering appears to be a property of motif transitions in raw BSOiD outputs that have not been temporally smoothed. If one watches the raw video, it seems that this may in fact be an accurate reflection of the manner in which behaviors unfold through time. Each behavior could be thought of, to use terminology from MOSEQ (B Datta), as a series of syllables strung together to make a phrase or sentence. Syllables can repeat over either fast or slow timescales, and may be shared across distinct words and sentences although the order and frequency of their recurrence will likely differ.

      We have clarified these points in the main text at ~lines 917-923, pg 29, and we added motif 13 to the list of motifs for the qualitative behavior labeled “oscillate” in Fig. 6a.

      • Lines 997-998: I don't understand this argument. Why does the existence of different temporal dynamics make imaging multiple areas 'one of the keys to potentially understanding the nature of their neuronal activity'?

      Authors’ Response: We believe this may be an important point, that comparisons of neurobehavioral alignment across cortical areas cannot be performed by pooling sessions that contain different distributions of dwell times for different behaviors, if in fact that dependence of neural activity on behavior depends on the exact elapsed time since the beginning of the current behavioral “bout”. Again, other reasons that imaging many areas simultaneously would provide a unique advantage over imaging smaller areas one at a time and attempting to pool data across sessions would include the identification of sequences or neural ensembles that span many areas across large distances, or the understanding of distributed coding of behavior (an issue we explore in an upcoming paper).

      We have clarified these points at the location in the Discussion that you have identified. Thank you for your questions and suggestions.

      Minor

      Line 41: What is the difference between decision, choice, and response periods?

      Authors’ Response: This now reads “...temporal separation of periods during which cortical activity is dominated by activity related to stimulus representation, choice/decision, maintenance of choice, and response or implementation of that choice.”

      Line 202: What does ambulatory mean in this context?

      Authors’ Response: Here we mean that the mice are able to walk freely on the wheel. In fact they do not actually move through space, so we have changed this to read “able to walk freely on a wheel, as shown in Figs. 1a and 1b”.

      Is there a reason why 4 mounting posts were used for the dorsal mount but only 1 post was sufficient for the lateral mount?

      Authors’ Response: Here, we assume you mean 2 posts for the side mount and 4 posts for the dorsal mount.

      In general our idea was to use as many posts as possible to provide maximum stability of the preparations and minimize movement artifacts during 2-photon imaging. However, the design of the side mount headpost precluded the straight-forward or easy addition of a right oriented, second arm to its lateral/ventral rim - this would have blocked access of both the 2-photon objective and the right face camera. In the dorsal mount, the symmetrical headpost arms are positioned further back (i.e. posterior), so that the left and right face cameras are not obscured.

      When we created the side mount preparation, we discovered that the 2 vertical 1” support posts were sufficient to provide adequate stability of the preparation and minimize 2-photon imaging movement artifacts. The side mount used two attachment screws on the left side of the headpost, instead of the one screw per side used in the dorsal mount preparation.

      We have included these points/clarifications in the main text at ~lines 217-230, pg 7.

      Figure S1g appears to be mislabeled.

      Authors’ Response: Yes, on the figure itself that panel was mislabeled as “f” in the original eLife reviewed preprint. We have changed this to read “g”.

      Line 349 and below: Why is the method called pseudo-widefield imaging?

      Authors’ Response: On the mesoscope, broad spectrum fluorescent light is passed through a series of excitation and emission filters that, based on a series of tests that we performed, allow both reflected blue light and epifluorescence emitted (i.e. Stokes-shifted) green light to reach the CCD camera for detection. Furthermore, the CCD camera (Thorlabs) has a much smaller detector chip than that of the other widefield cameras that we use (RedShirt Imaging and PCO), and we use it to image at an acquisition speed of around 10 Hz maximum, instead of ~30-50 Hz, which is our normal widefield imaging acquisition speed (it also has a slower readout than what we would consider to be a standard or “real” 1-photon widefield imaging camera).

      For these 3 reasons we refer to this as “pseudo-widefield” imaging. We would not use this for sensory activity mapping on the mesoscope - we primarily use it for mapping cortical vasculature and navigating based on our multimodal map to CCF alignment, although it is actually “contaminated” with some GCaMP6s activity during these uses.

      We have briefly clarified this in the text.

      Figures 4d & e: Do the colors show mean correlations per area? Please add labels and units to the colorbars as done in panel 4a.

      Authors’ Response: For both Figs 4 and 5, we have added the requested labels and units to each scale bar, and have relabeled panels d to say “Rastermap CCF area cell densities”, and panels e to say “mean CCF area corrs w/ neural activity.”

      Thank you for catching these omissions/mislabelings.

      Line 715: what is superneuron averaging?

      Authors’ Response: This refers to the fact that when Rastermap displays more than ~1000 neurons it averages the activity of each group of adjacent 50 neurons in the sorting to create a single display row, to avoid exceeding the pixel limitations of the display. Each single row representing the average activity of 50 neurons is called a “superneuron” (Stringer et al, 2023; bioRxiv).

      We have modified the text to clarify this point.

      Line 740: it would be good to mention what exactly the CCF density distribution quantifies.

      Authors’ Response: In each CCF area, a certain percentage of neurons belongs to each Rastermap group. The CCF density distribution is the set of these percentages, or densities, across all CCF areas in the dorsal or side mount preparation being imaged in a particular session. We have clarified this in the text.

      Line 745: what does 'within each CCF' mean? Does this refer to different areas?

      Authors’ Response: The corrected version of this sentence now reads: “Next, we compared, across all CCF areas, the proportion of neurons within each CCF area that exhibited large positive correlations with walking speed and whisker motion energy.”

      How were different Rastermap groups identified? Were they selected by hand?

      Authors’ Response: Yes, in Figs. 4, 5, and 6, we selected the identified Rastermap groups “by hand”, based on qualitative similarity of their activity patterns. At the time, there was no available algorithmic or principled means by which to split the Rastermap sort. The current, newer version of Rastermap (Stringer et al, 2023) seems to allow for algorithmic discretization of embedding groups (we have not tested this yet), but it was not available at the time that we performed these preliminary analyses.

      In terms of “correctness” of such discretization or group identification, we intend to address this issue in a more principled manner in upcoming publications. For the purposes of this first paper, we decided that manual identification of groups was sufficient to display the capabilities and outcomes of our methods.

      We clarify this point briefly at several locations in the revised manuscript, throughout the latter part of the Results section.

      Reviewer #3 (Recommendations For The Authors):

      In "supplementary figures, protocols, methods, and materials", Figure S1 g is mislabeled as Figure f.

      Authors’ Response: Yes, on the figure itself this panel was mislabeled as “f” in the original reviewed preprint. We have changed this to read “g”.

      In S1 g, the success rate of the surgical procedure seems quite low. Less than 50% of the mice could be imaged under two-photon. Can the authors elaborate on the criteria and difficulties related to their preparations?

      Authors’ Response: We will elaborate on the difficulties that sometimes hinder success in our preparations in the revised manuscript.

      The success rate indicated to the point of “Spontaneous 2-P imaging (window) reads 13/20, which is 65%, not 50%. The drop to 9/20 by the time one gets to the left edge of “Behavioral Training” indicates that some mice do not master the task.

      Protocol I contains details of the different ways in which mice either die or become unsuitable or “unsuccessful” at each step. These surgeries are rather challenging - they require proper instruction and experience. With the current protocol, our survival rate for the window surgery alone is as high as 75-100%. Some mice can be lost at headpost implantation, in particular if they are low weight or if too much muscle is removed over the auditory areas. Finally, some mice survive windowing but the imageable area of the window might be too small to perform the desired experiment.

      We have added a paragraph detailing this issue in the main text at ~lines 287-320, pg 9.

      In both Suppl_Movie_S1_dorsal_mount and Suppl_Movie_S1_side_mount provided (Movie S1), the behaviour video quality seems to be unoptimized which will impact the precision of Deeplabcut. As evident, there were multiple instances of mislabeled key points (paws are switched, large jumps of key points, etc) in the videos.

      Many tracked points are in areas of the image that are over-exposed.

      Despite using a high-speed camera, motion blur is obvious.

      Occlusions of one paw by the other paws moving out of frame.

      As Deeplabcut accuracy is key to higher-level motifs generated by BSOi-D, can the authors provide an example of tracking by exclusion/ smoothing of mislabeled points (possibly by the median filtering provided by Deeplabcut), this may help readers address such errors.

      Authors’ Response: We agree that we would want to carefully rerun and carefully curate the outputs of DeepLabCut before making any strong claims about behavioral identification. As the aim of this paper was to establish our methods, we did not feel that this degree of rigor was required at this point.

      It is inevitable that there will be some motion blur and small areas of over-exposure, respectively, when imaging whiskers, which can contain movement components up to ~150 Hz, and when imaging a large area of the mouse, which has planes facing various aspects. For example, perfect orthogonal illumination of both the center of the eye and the surface of the whisker pad on the snout would require two separate infrared light sources. In this case, use of a single LED results in overexposure of areas orthogonal to the direction of the light and underexposure of other aspects, while use of multiple LEDs would partially fix this problem, but still lead to variability in summated light intensity at different locations on the face. We have done our best to deal with these limitations.

      We now briefly point out these limitations in the methods text at ~lines 155-160, pg 5.

      In addition, we have provided additional raw and processed movies and data related to DeepLabCut and BSOiD behavioral analysis in our FigShare+ repository, which is located at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      In lines 153-154, the authors mentioned that the Deeplabcut model was trained for 650k iterations. In our experience (100-400k), this seems excessive and may result in the model overfitting, yielding incorrect results in unseen data. Echoing point 4, can the authors show the accuracy of their Deeplabut model (training set, validation set, errors, etc).

      Authors’ Response: Our behavioral analysis is preliminary and is included here as an example of our methods, and not to make claims about any specific result. Therefore we believe that the level of detail that you request in our DeepLabCut analysis is beyond the scope of the current paper. However, we would like to point out that we performed many iterations of DeepLabCut runs, across many mice in both preparations, before converging on these preliminary results. We believe that these results are stable and robust.

      We believe that 650k iterations is within the reasonable range suggested by DLC, and that 1 million iterations is given as a reasonable upper bound. This seems to be supported by the literature for example, see Willmore et al, 2022 (“Behavioral and dopaminergic signatures of resilience”, Nature, 124:611, 124-132). Here, in a paper focused squarely on behavioral analysis, DLC training was run with 1.3 million iterations with default parameters.

      We now note, on ~lines 153-154, pg 5, that we used 650K iterations, a number significantly less than the default of 1.03 million, to avoid overfitting.

      In lines 140-141, the authors mentioned the use of slicing to downsample their data. Have any precautions, such as a low pass filter, been taken to avoid aliasing?

      Authors’ Response: Most of the 2-photon data we present was acquired at ~3 Hz and upsampled to 10 Hz. Most of the behavioral data was downsampled from 5000 Hz to 10 Hz by slicing, as stated. We did not apply any low-pass filter to the behavioral data before sampling. The behavioral variables have heterogeneous real sampling/measurement rates - for example, pupil diameter and whisker motion energy are sampled at 30 Hz, and walk speed is sampled at 100 Hz. In addition, the 2-photon acquisition rate varied across sessions.

      These facts made principled, standardized low-pass filtering difficult to implement. We chose rather to use a common resampling rate of 10 Hz in an unbiased manner. This downsampled 10 Hz rate is also used by B-SOiD to find transitions between behavioral motifs (Hsu and Yttri, 2021).

      We do not think that aliasing is a major factor because the real rate of change of our Ca2+ indicator fluorescence and behavioral variables was, with the possible exception of whisker motion energy, likely at or below 10 Hz.

      We now include a brief statement to this effect in the methods text at ~lines 142-146, pg. 4.

      Line 288-299, the authors have made considerable effort to compensate for the curvature of the brain which is particularly important when imaging the whole dorsal cortex. Can the authors provide performance metrics and related details on how well the combination of online curvature field correction (ScanImage) and fast-z "sawtooth"/"step" (Sofroniew, 2016)?

      Authors’ Response: We did not perform additional “ground-truth” experiments that would allow us to make definitive statements concerning field curvature, as was done in the initial eLife Thorlabs mesoscope paper (Sofroniew et al, 2016).

      We estimate that we experience ~200 micrometers of depth offset across 2.5 mm - for example, if the objective is orthogonal to our 10 mm radius bend window and centered at the apex of its convexity, a small ROI located at the lateral edge of the side mount preparation would need to be positioned around 200 micrometers below that of an equivalent ROI placed near the apex in order to image neurons at the same cortical layer/depth, and would be at close to the same depth as an ROI placed at or near the midline, at the medial edge of the window. We determined this by examining the geometry of our cranial windows, and by comparing z-depth information from adjacent sessions in the same mouse, the first of which used a large FOV and the second of which used multiple small FOVs optimized so that they sampled from the same cortical layers across areas.

      We have included this brief explanation in the main text at ~lines 300-311, pg 9.

      In lines 513-515, the authors mentioned that the vasculature pattern can change over the course of the experiment which then requires to re-perform the realignment procedure. How stable is the vasculature pattern? Would laser speckle contrast yield more reliable results?

      Authors’ Response: In general the changes in vasculature we observed were minimal but involved the following: i) sometimes a vessel was displaced or moved during the window surgery, ii) sometimes a vessel, in particular the sagittal sinus, enlarged or increased its apparent diameter over time if it is not properly pressured by the cranial window, and iii) sometimes an area experiencing window pressure that is too low could, over time, show outgrowth of fine vascular endings. The most common of these was (i), and (iii) was perhaps the least common. In general the vasculature was quite stable.

      We have added this brief discussion of potential vasculature changes after cranial window surgery to the main text at ~lines 286-293, pg 9.

      We already mentioned, in the main text of the original eLife reviewed preprint, that we re-imaged the multimodal map (MMM) every 30-60 days or whenever changes in vasculature are observed, in order to maintain a high accuracy of CCF alignment over time. See ~lines 507-511, pg 16.

      We are not very familiar with laser speckle contrast, and it seems like a technique that could conceivably improve the fine-grained accuracy of our MMM-CCF alignment in some instances. We will try this in the future, but for now it seems like our alignments are largely constrained by several large blood vessels present in any given FOV, and so it is unclear how we would incorporate such fine-grained modifications without applying local non-rigid manipulations of our images.

      In lines 588-598, the authors mentioned that the occasional use of online fast-z corrections yielded no difference. However, it seems that the combination of the online fast-z correction yielded "cleaner" raster maps (Figure S3)?

      Authors’ Response: The Rastermaps in Fig S3a and b are qualitatively similar. We do not believe that any systematic difference exists between their clustering or alignments, and we did not observe any such differences in other sessions that either used or didn’t use online fast-z motion correction.

      We now provide raw data and analysis files corresponding to the sessions shown in Fig S3 (and other data-containing figures) on FigShare+ at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      Ideally, the datasets contained in the paper should be available on an open repository for others to examine. I could not find a clear statement about data availability. Please include a linked repo or state why this is not possible.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here:

      Vickers, Evan; A. McCormick, David (2024). Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice. Figshare+. Collection:

      https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Public Review

      Summary:

      (1) This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid.

      (2) It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      (3) The primary strength is in applying a biomechanical model to omega-turn behaviors.

      (4) The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling.

      (5) The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      (6) The strength of the model presented in this work relative to prior approaches is not well supported, and in general, the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion.

      (7) This paper claims to improve on previous approaches to taking body shapes as inputs.

      (8) However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data.

      (9) Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed.

      (10) Finally, the overall novelty of the approach is questionable.

      (11) A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      9-11: The paper you recommended and our manuscript have some similarities and differences.

      Similarities

      Firstly, the components constituting the worm are similar in both models. ElegansBot models the worm as a chain of n rods, while the study by Majmudar et al. (2012) models it as a chain of n beads. Each bead in the Majmudar et al. model has a directional vector, making it very similar to ElegansBot's rod. However, there's a notable difference: in the Majmudar et al. model, each bead has an area for detecting contact between the obstacle and the bead, while in ElegansBot, the rod does not feature such an area.

      Secondly, the types of forces and torques acting on the components constituting the worm are similar. Each rod in ElegansBot receives frictional force, muscle force, and joint force. Each bead in the Majmudar et al. model receives a constraint force, viscous force, and a repulsive force from obstacles. Each rod in ElegansBot receives frictional torque, muscle torque, and joint torque. Each bead in the Majmudar et al. model receives elastic torque, constraint torque, drive torque, and viscous torque. The Majmudar et al. model's constraint force and torque are similar to ElegansBot's joint force and torque in that they prevent two connected components of the worm from separating. The Majmudar et al. model's viscous force and torque are similar to ElegansBot's frictional force and torque in that they are forces exchanged between the worm and its surrounding environment (ground surface). The Majmudar et al. model's drive torque is similar to ElegansBot's muscle force and muscle torque as a cause of the worm's motion. However, unlike ElegansBot, the Majmudar et al. model did not consider the force generating the drive torque, and there are differences in how each force and torque is calculated. This will be discussed in more detail below.

      Differences

      Firstly, the medium in which the worm locomotes is different. ElegansBot is a model describing motion in a homogeneous medium like agar or water without obstacles, while the Majmudar et al. model describes motion in water with circular obstacles fixed at each lattice point. This is because the purposes of the models are different. ElegansBot analyzes locomotion patterns based on the friction coefficient, while the Majmudar et al. model analyzes locomotion patterns based on the characteristics of the obstacle lattice, such as the distance between obstacles. Also, for this reason, the Majmudar et al. model's bead, unlike ElegansBot's rod, receives a repulsive force from obstacles.

      Secondly, the specific methods of calculating similar types of forces differ. ElegansBot calculates joint forces by substituting frictional forces, muscle forces, frictional torques, and muscle torques into an equation derived from differentiating a boundary condition equation twice over time, where two neighboring rods always meet at one point. This involves determining the process through which various forces and torques are transmitted across the worm. Specifically, it entails calculating how the frictional forces and torques, as well as the muscle forces and torques acting on each rod, are distributed throughout the entire length of the worm. In contrast, The Majmudar et al. model uses Lagrange multipliers method based on a boundary condition that the curve length determined by each bead's tangential angle does not change, to calculate the constraint force and torque before calculating the drive torque and viscous force. This implies that the Majmudar et al. model did not consider the mechanism by which the drive torque and viscous force received by one bead are distributed throughout the worm. ElegansBot's rod receives an anisotropic Stokes frictional force from the ground surface, while the Majmudar et al. model considered the frictional force according to the Navier-Stokes equation for incompressible fluid, assuming the fluid velocity at the bead's location as the bead's velocity.

      Thirdly, unlike the Majmudar et al. model, ElegansBot considers the inertia of the worm components. Therefore, ElegansBot can simulate regardless of how low or high the ground surface's friction coefficient is. the Majmudar et al. model is not like this.

      (12) The idea of applying biomechanical models to describe omega turns in C. elegans is a good one, however, the kinematic basis of the model as used in this paper (the authors do note that the control angle could be connected to a neural model, but don't do so in this work) limits the generation of neuromechanical control hypotheses.

      8, 12: We do not agree with the claim that ElegansBot could limit other researchers in generating neuromechanical control hypotheses. The term θ_("ctrl" ,i)^((t) ) used in our model is designed to be replaceable with neuromechanical control in the future.

      (13) The model may provide insights into the biomechanics of such behaviors, however, the results described are very minimal and are purely qualitative.

      (14-1) Overall, direct comparisons to the experiments are lacking or unclear.

      14-1: If you look at the text explaining Fig. 2 and 5 (Fig. 2 and 4 in old version), it directly compares the velocity, wave-number, and period as numerical indicators representing the behavior of the worm, between the experiment and ElegansBot.

      (14-2) Furthermore, the paper claims the value of the model is to produce the force fields from a given body shape, but the force fields from omega turns are only pictured qualitatively.

      13, 14-2: We gratefully accept the point that our analysis of the omega-turn is qualitative. Therefore, we have conducted additional quantitative analysis on the omega-turn and inserted the results into the new Fig. 4. We have considered the term 'Force field' as referring to the force vector received by each rod. We have created numerical indicators representing various behaviors of the worm and included them in the revised manuscript.

      (15) No comparison is made to other behaviors (the force experienced during crawling relative to turning for example might be interesting to consider) and the dependence of the behavior on the model parameters is not explored (for example, how does the omega turn change as the drag coefficients are changed).

      Thank you for the great idea. To compare behaviors, first, a clear criterion for distinguishing behaviors is needed. Therefore, we have created a new mathematical definition for behavior classification in the revised manuscript (“Defining Behavioral Categories” in Method). After that, we compared the force and power (energy consuming rate) between each forward locomotion, backward locomotion, and omega-turn (Fig. 4). And in the revised manuscript, we newly analyzed how the turning behavior changes with variations in the friction coefficients in Figs. S4-S7.

      (16) If the purpose of this paper is to recapitulate the swim-to-crawl transition with a simple model, and then apply the model to new behaviors, a more detailed analysis of the behavior of the model variables and their dependence on the variables would make for a stronger result.

      In our revised manuscript, we have quantitatively analyzed the changes occurring in turning behavior from water to agar, and the results are presented in Figs. S9 and S10.

      (17) In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest.

      (18) Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored.

      (19) Plate conditions are difficult to replicate and the rheology of plates likely depends on a number of factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      18, 19: As mentioned in the paper, we do not know if the friction coefficients in the study of Boyle et al. (2012) and the friction coefficients in the experiment of Stephens et al. (2016) are the same. In our revised manuscript, we have explored more in detail the effects of the friction coefficient's scale factor, and explained why we chose a scale factor of 1/100 (“Proper Selection of Friction Coefficients” in Supplementary Information). In summary, we analyzed the changes in trajectory due to scaling of the friction coefficient, and chose the scale factor 1/100 as it allowed ElegansBot to accurately reproduce the worm's trajectory while also being close to the friction coefficients in the Boyle et al. paper.

      (20) Finally, the language used to distinguish different modeling approaches was often unclear.

      (21) For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Thank you for the feedback. As you pointed it out, we have corrected that part to 'kinematic' in the revised manuscript.

      (22) Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

      We agree that the expression may not be immediately clear. This is due to the word limit for the abstract (the abstract of eLife VOR should be under 200 words, and our paper's abstract is 198 words), which forced us to convey the causality in a limited number of words. Therefore, although we will not change the abstract, the expression in question means that the muscle tension, which is the cause of the worm's locomotion, ultimately generates the frictional force between the worm and the ground surface.

      Recommendations For The Authors

      (23) As I stated in my public review, I think the paper could be made much stronger if a more detailed exploration of turning mechanics was presented.

      (24) Relatedly, rather than restricting the analysis to individual videos of turning behaviors, I wonder if a parameterized model of the turning kinematics would be fruitful to study, to try to understand how different turning gaits might be more or less energetically favorable.

      We thank the reviewer once again for their suggestion. Thanks to their proposal, we were able to conduct additional quantitative analysis on turning behavior.

      Reviewer #2

      Public Review

      Summary:

      (1) Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at a low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface.

      (2) The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motiliy behavior.

      Strengths: (3) The model is general due to its simplicity and likely useful for various undulatory movements.

      (4) The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc).

      (5) The model is predictive (semi?) as shown in the liquid-to-solid gait transition.

      (6) The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Weaknesses:

      (7) Since the inputs to the model are the actual shape changes in time, parameterized as angles (or curvature), the ability of the model to reproduce a realistic facsimile of C. elegans motion is not really a huge surprise. (8) The authors do not include some important physical parameters in the model and should explain in the text these assumptions.

      (9. 1) The cuticle stiffness is significant and has been measured [1].

      (10. 2) The body of C. elegans is under high hydrostatic pressure which adds an additional stiffness [2].

      (11. 3) The visco-elasticity of C. elegans body has been measured. [3]

      Thank you for asking. The stiffness of C. elegans is an important consideration. We took this into account when creating ElegansBot, but did not explain it in the paper. The detailed explanation is as follows. C. elegans indeed has stiffness due to its cuticle and internal pressure. This stiffness is treated as a passive elastic force (elastic force term of lateral passive body force) in the paper of Boyle et al. (2012). However, the maximum spring constant of the passive elastic force is 1/20 of the maximum spring constant of the active elastic force. If we consider this fact in our model, the elastic term of the muscle torque is as follows: ( is the active torque elasticity coefficient, is the passive torque elasticity coefficient)

      where

      Therefore, there is no need to describe the active and passive terms separately in

      Furthermore, since , assuming , then and .

      (12) There is only a very brief mention of proprioception.

      (13) The lack of inclusion of proprioception in the model should be mentioned and referenced in more detail in my opinion.

      As you emphasized, proprioception is an important aspect in the study of C. elegans' locomotion. In our paper, its importance is briefly introduced with a sentence each in the introduction and discussion. However, our research is a model about the process of the creation of body motion originated from muscle forces, and it does not model the sensory system that senses body posture. Therefore, there is no mention of using proprioception in our paper's results section. What is mentioned in the discussion is that ElegansBot can be applied as the kinetic body model part in a combination model of a kinetic body model and a neuronal circuit model that receives proprioception as a sensory signal.

      (14) These are just suggested references.

      (15) There may be more relevant ones available.

      The papers you provided contain specific information about the Young's modulus of the C. elegans body. The first paper (Rahimi et al., 2022) measured the Young's modulus of the cuticle after chemically isolating it from C. elegans, while the second paper (Park et al., 2007) and third paper (Backholm et al., 2013) measured the elasticity and Young's modulus of C. elegans without separating the cuticle. Based on the Young's modulus provided in each paper (although the second and third papers did not measure stiffness in the longitudinal direction), we derived the elastic coefficient (assuming a worm radius of 25 μm, cuticle thickness of 0.5 μm, and 1/25 of longitudinal length of the cuticle of 40 μm). The range was quite broad, from 9.82ⅹ1011 μg/sec2 (from the first paper) to 2.16 ⅹ 108 μg / sec2 (from the third paper). Although the elastic coefficient value in our paper falls within this range, since the range of the elastic coefficient is wide, we think we can modify the elastic coefficient in our paper and will be able to reapply our model if more accurate values become known in the future.

      Reviewer #3

      Public Review

      Summary:

      (1) A mechanical model is used with input force patterns to generate output curvature patterns, corresponding to a number of different locomotion behaviors in C. elegans

      Strengths:

      (2) The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths.

      (3) The matching of speeds (though qualitative and shown only on agar) is a strength.

      Weaknesses:

      (4) What is the relation between input and output data?

      ElegansBot takes the worm's body control angle as the input, and produces trajectory and force of each segment of the worm as the output.

      (5) How does the input-output relation depend on the parameters of the model?

      If 'parameter' is understood as vertical and horizontal friction coefficients, then the explanation for this can be found in Fig. 5 (Fig. 4 in the old version).

      (6) What biological questions are addressed and can significant model predictions be made?

      Equation of motion deciphering locomotion of C. elegans including turning behaviors which were relatively less well understood.

      Recommendations For The Authors

      (7) The novelty and significance of the paper should be clarified.

      We have added quantitative analyses of turning behavior in the revised manuscript, and we hope this will be helpful to you.

      (8) Previously much more detailed models have been published, as compared to this one.

      We hope the reviewer can point out any previous model that we may have missed.

      (9) The mechanics here are simplified (e.g. no information about dorsal/ventral innervation but only a bending angle) setting limitations on the capacity for model predictiveness.

      (10) Such limitations should be discussed.

      We view the difference between dorsal/ventral innervation and bending angle not as a matter of simplification, but rather as a reflection of the hierarchy that our model implements. Our model does not consider dorsal/ventral innervation, but it uses the bending angle to reproduce behavior in various input and frictional environments, which signifies the strong predictiveness of ElegansBot (Figure 2, 3, 5 (2, 3, 4 in the old version)). Moreover, if the midline of C. elegans is incompressible, then modeling by dividing into dorsal/ventral, as opposed to modeling solely with the bending angle, does not increase the degree of freedom of the worm model, and therefore does not increase its predictiveness.

      (11) The aims of the paper and results need to be supported quantitatively and analyzed through parameter sweeps and intervention.

      We have conducted additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

      (12) The methods are given only in broad brushstrokes, and need to be much more clear (and ideally sharing all code).

      We have thoroughly detailed every aspect of this research, from deriving the physical constants of C. elegans, agar, and water to developing the formulas and proofs necessary for operating ElegansBot and its applications. This comprehensive information is all presented in the Results, Methods, and Supplementary Information sections, as well as in the source code. Moreover, we have already ensured that our research can be easily reproduced by providing detailed explanations and by making ElegansBot accessible through public software databases (PyPI, GitHub). To further aid in its application and understanding, especially for those less familiar with the subject, we have also included minimal code as examples in the database. This code is designed to simplify the process of reproducing the results of the paper, thereby making our research more accessible and understandable. Therefore, we believe that readers will easily gain significant assistance from the extensive information we have provided. Should readers require further help, they can always contact us, and we will be readily available to offer support.

      (13) The supporting figures and movies need to include a detailed analysis to evidence the claims.

      We have conducted and provided additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      This manuscript provides some valuable findings concerning the hippocampal circuitry and the potential role of adult-born granule cells in an interesting long-term social memory retrieval. The behavior experiments and strategy employed to understand how adult-born granule cells contribute to long-term social discrimination memory are interesting.

      We thank the reviewer for the positive evaluation.

      I have a few concerns, however with the strength of the evidence presented for some of the experiments. The data presented and the method described is incomplete in describing the connection between cell types in CA2 and the projections from abGCs. Likewise, I worry about the interpretation of the data in Figures 1 and 2 given the employed methodology. I think that the interpretation should be broadened. This second concern does not impact the interest and significance of the findings.

      In response to this concern, we have removed the data concerning abGC projections to PCP4+ and PV-GFP+ cell bodies from Figure 1 and have focused this analysis on dendrites. We now provide high magnification images of dendrites and expand on the methodology, results, and interpretations in the manuscript. We also broaden the interpretation throughout the manuscript to address the reviewer’s concern.

      Strengths:

      The behavior experiments are beautifully designed and executed. The experimental strategy is interesting.

      We appreciate these positive comments.

      Weaknesses:

      The interpretation of the results may not be justified given the methods and details provided.

      We have addressed this concern by providing more methodological details and broadening our interpretation of the results.

      Reviewer #2:

      Summary:

      Laham et al. investigate how the projection from adult-born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. I find the study to be very interesting, the results are important for our understanding of how social memories of different natures (remote or immediate) are encoded and supported by the hippocampal circuitry. I have some points that I added below that I think could help clarify the conclusions:

      We appreciate the positive assessment and have addressed the more specific points below.

      My major concern with the manuscript was that making the transitions between the different experiments for each result section is not very smooth. Maybe they can discuss a bit in a summary conclusion sentence at the end of each result section why the next set of experiments is the most logical step.

      In response, we have added summary conclusion sentences at the end of each result section.

      In line 113, the authors say that "the DG is known to influence hippocampal theta-gamma coupling and SWRs". Another recent study Fernandez-Ruiz et al. 2021, examined how various gamma frequencies in the dentate gyrus modulate hippocampal dynamics.

      We cite this paper in the revised manuscript.

      Having no single cells in the electrophysiological recordings makes it difficult to interpret the ephys part. Perhaps having a discussion on this would help interpret the results. If more SWRs are produced from the CA2 region (perhaps aided by projections from abGC), more CA2 cells that respond to social stimuli (Oliva et al. 2020) would reactivate the memories, therefore making them consolidate faster/stronger. On the other hand, the projections from abGC that the authors see, also target a great deal of PV+ interneurons, which have been shown to pace the SWRs frequency (Stark et al 2014, Gan et al 2017), which further suggests that this projection could be involved in SWRs modulation.

      We discuss these possibilities and cite Gan et al 2017, Schlingloff et al., 2014, and Stark et al., 2014 in the revised manuscript.

      The authors should cite and discuss Shuo et al., 2022 (A hypothalamic novelty signal modulates hippocampal memory).

      We mention Chen et al (A hypothalamic novelty signal modulates hippocampal memory.) in the revised manuscript. “Shuo” is the first name of the first author on this paper, so we believe that this is the same paper to which the reviewer refers.

      I think the authors forgot to refer to Fig 3a-f, maybe around lines 163-168.

      We thank the reviewer for pointing out this error. In the revised manuscript, we refer to all figure panels. Since Fig 3 is now broken into two figures (Fig 3 and 4), the panel lettering has changed in the revised manuscript.

      Are the SWRs counted only during interaction time or throughout the whole behavior session for each condition?

      The SWRs are counted throughout the whole behavior session for each condition. This is now stated in the revised manuscript.

      Figure 3t shows a shift in the preferred gamma phase within theta cycles as a result of abGC projections to CA2 ablation with CNO, especially during Mother CNO condition. I think this result is worth mentioning in the text.

      We now mention this finding in the revised manuscript.

      Figure 3u in the legend mention "scale bars = 200um", what does this refer to?

      The scale bar refers to that shown in Figure 3b, which is now indicated in the legend.

      What exactly is calculated as SWR average integral? Is it a cumulative rate? Please clarify.

      The integral measure provides information regarding the average total power of SWR events. It sums z-scored amplitude values from beginning to the end of each SWR envelope, and then takes the average across all summed envelopes. SWR integral has been shown to influence SWR propagation (De Filippo and Schmitz, 2023). This is now described in the text.

      Alexander et al 2017, "CA2 neuronal activity controls hippocampal oscillations and social behavior", examined some of the CA2 effects in the hippocampal network after CNO silencing, and the authors should cite it.

      Alexander et al., 2018, which we believe is the relevant paper, is now cited in the revised manuscript.

      Strengths:

      Behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      Electrophysiological experiments are difficult to interpret without additional quantifications (single-cell responses during interactions etc.)

      We have addressed this concern by expanding the interpretation of our results.

      Reviewer #3:

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing, and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused on the social memory of the mother, and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation, and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking in the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      In response to these comments, we provide raw interaction times in a new Figure (Fig. S1). We also provide more information about the experiments and figures in the revision. We explain the rationale for our behavioral interpretations and discuss proposed mechanisms for how abGCs regulate SWR and PAC.

      The manuscript is well-written with the appropriate references. The choice of the behavioral test is somewhat debatable, however. It is surprising that the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternation) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthen the results if the authors could repeat a key experiment from their investigation using such a test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval would allow us to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      We selected the direct social interaction test because it allows for more naturalistic social behaviors than measuring investigation times toward social stimuli located inside wire mesh containers. We also decided to focus our studies on the retrieval of mother memories because these are likely the first social memories to be formed. We emphasize that our results cannot be generalized to memories of other social stimuli but given studies on recent social memory formation and retrieval in adults that manipulate abGCs and CA2 separately, we feel that it is likely that this circuit is involved in these functions as well. However, we specify throughout the manuscript that our experiments can only tell us about mother memories. We have also changed the title to reflect this.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In Figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At a minimum, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      We have divided Figure 3 into two figures (Figures 3 and 4) and revised the electrophysiology section of the results section. In the revised paper, we now discuss how abGC projections to PV+ interneurons may facilitate SWR and PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further, and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

      In response to these comments, we discuss possible answers to these interesting questions.

      Recommendations for the authors:

      Reviewer #1:

      Specifically, in Figure 1, for the analysis of the synapses formed between abGCs and CA2 PNS (as identified by PCP4 expression) and CA2 PV+ cells (as identified by cre-dependent AAV-mCherry expression) in PV-cre line. In panels c and d the soma of a CA2 PN cell is shown, as well as the soma of a PV cell is shown. Why was the soma analyzed? What relevance is there for this? It is my understanding that synapses form on dendrites- this would be much more relevant to show, in my opinion. Also, the methods for panels e and f state that the 3R-Tau+ intensity was analyzed only in stratum lucidum. (There was a normalization for the overall 3R-Tau intensity in SL of CA2 that was obtained by dividing the 3R-Tau intensity of corpus callosum). I don't understand then how a comparison of 3RTau intensity could have been done for CA2 PN soma. There are no CA2 PN soma in stratum lucidum. (This is fairly clearly shown in Figure 1aiii, with the PCP4 staining showing the soma in the somatic layer... not in stratum lucidum). What is being analyzed here?

      If the 3R-Tau intensity for dendrites is higher for PV cell dendrites, an example image of dendrites would be very helpful. How was the CA2 PV cell dendrite delimited from the CA2 PN dendrites at 40x magnification for the 3R-Tau intensity? Why were pre-synaptic puncta not examined? Is it possible to determine the post-synaptic target with these methods? This result could be particularly interesting, but I find it very difficult to understand the quantification or the justification behind it. To truly know if a cell is getting a connection, the best method would be to perform whole-cell patch clamp recordings of the post-synpatic target cells and use optogenetics of the abGCs. I understand that perhaps this may be beyond the scope of the paper, but it is a severe limitation for these results.

      We have eliminated the cell body measures from Figure 1 and focus instead on the dendrite measures, which we agree are more relevant. We now provide high magnification example images of pyramidal cell (PCP4+) and PV+ interneuron (GFP+) dendrites in Figure 1. We thank the reviewer for pointing out the error about the stratum lucidum as some of the dendrites analyzed are located in the pyramidal cell layer. In addition, neither PCP4 nor GFP label the full extent of dendrites emanating from CA2 pyramidal cells or PV+ interneurons respectively. We mention this in the revised manuscript because abGC projections to more distal dendrites might show a different pattern than that which was observed for proximal dendrites. We also provide more details about how the dendrites were delimited for the analysis, and mention that these results cannot definitively inform us about whether functional synaptic connections have been formed.

      Canulation over CA2 is potentially not specific to CA2 terminals. It would be optimal if the authors had some histology demonstrating specific cannula placement, as these surgeries are really tough to get perfectly centered over CA2. Even if it is perfectly centered, how much would the CNO diffuse into CA3? I think that given the methodology, the authors really need to consider that the behavioral results are not only a result of blocking abGC terminals in CA2 alone. Would it really change much if the abGC terminals are also silenced in CA3a/b as well? The McHugh lab has shown that area CA3 is also playing a role in social memory (Chiang, M.-C., Huang, A. J. Y., Wintzer, M. E., Ohshima, T. & McHugh, T. J. A role for CA3 in social recognition memory. Behav Brain Res 354, 2018). It may be that both areas CA2 and CA3 are important for the phenomenon being demonstrated in Figure 2. I think the impact of the study is just as interesting, as this examination of early social memories is very interesting and nicely done. In fact, areas CA2 and CA3 may be acting together (please see Stöber, T. M., Lehr, A. B., Hafting, T., Kumar, A. & Fyhn, M. Selective neuromodulation and mutual inhibition within the CA3-CA2 system can prioritize sequences for replay. Hippocampus 30, 1228-1238, 2020).

      We agree that it is possible that CNO infusions targeted at the CA2 would also influence CA3a/b and have revised the paper to include this possible interpretation. We also cite the suggested paper on CA3 involvement in social memory (Chiang et al., 2018) and the paper on CA2-CA3 interactions (Stöber et al, 2020).

      Figure 3 is packed with information, but not communicated in a reasonable way. Much more information and a description of the experimental protocol need to be presented. Furthermore, why are there no example traces for the SWRs recorded? There should be more analysis than just a difference score and frequency. How is j, k, and l analyzed and interpreted? Why no example traces there? Also, the n's seem way too small for Figure 3mr. Are there only 32 or three animals used for some of these conditions? This is insufficient in my opinion to conclude much for a 5-minute interaction.

      In response to this concern, we have divided Figure 3 into 2 figures – Figure 3 and Figure 4. In Figure 3, we provide example traces for SWRs, with additional SWR data presented in Figures S3 and S4, including data to complement the difference score data in Figure 3. In Figure 4, we include traces of phase amplitude coupling. We also provide more information in the methods about how the phase amplitude coupling data were analyzed. For Figure 4, we used methods described by Tort et al., 2010 to produce a modulation index, which is a measure of the intensity of coupling between theta phase and gamma amplitude. This method additionally allows for visualization of how gamma amplitude is modified across individual theta phase cycles. Regarding the question about n sizes in the 10-12 week abGC group (Fig. 3), the numbers are lower than in the 4-6 week abGC group because by 6 weeks after the first set of recordings, the electrodes in some of the mice were no longer usable. The n sizes for this specific study are 4-5 per group for Nestin-cre mice; 7-8 for Nestin-cre:Gi. This is now clarified in the figure legend.

      The discussion section of this paper does not put these results into a broader context with the field. There are other studies examining abGCs and their roles in novelty and memory formation (the work from Juna Song's lab, for example). These should be properly mentioned and discussed.

      In response, we have added discussion on the roles of abGCs in nonsocial novelty and memory formation and have cited papers from the Song lab.

      In the figure legend for Figure 2, there is no specific explanation for panel h. Perhaps the label is missing in the legend.

      We thank the reviewer for noting this error and now include a description in the revised manuscript.

      Reviewer #2:

      Adding more quantifications (single cells, isolating data during interactions versus noninteraction times) would help understand the results better. In the lack of this, adding a more clear rationale (even if only through the presentation of hypotheses) in between the transitions of the different results sections would make the study easier to read.

      In response to this comment, we have added transition sentences between results sections to clarify the rationale and make the manuscript easier to understand.

      Reviewer #3:

      Line 110: "Hippocampal phase-amplitude coupling (PAC) and generation of sharp waveripples (SWRs) have been linked to novel experience, memory consolidation, and retrieval (Colgin, 2015; Fernandez Ruiz et al., 2019; Meier et al., 2020; Joo and Frank, 2018; Vivekananda et al., 2021). The DG is known to influence hippocampal theta-gamma coupling and SWRs (Bott et al, 2016; Meier et al., 2020), yet no studies have examined the influence of abGCs on these oscillatory patterns." This information comes too early in the result section and is somewhat confusing.

      In response to this comment, we have moved this information and provided a better description.

      Line 118: "we found that mice with normal levels of abGCs can discriminate between their own mother and a novel mother." Be more descriptive of the results (present the raw interaction times with the statistical test to compare them), this is the conclusion.

      In response to this comment, we provide the raw interaction times in a new Figure (Fig. S1) and describe the results in more detail.

      Line 121: "These effects were not due to changes in physical activity". Be more specific. Did you subject the mice to a specific test? If not, how did you calculate locomotion? The data presented in the supplementary figure 1a only states the % locomotion.

      Locomotion was manually scored whenever an animal moved in the testing apparatus. Speed was not recorded. Total locomotion was divided by trial duration to create a % locomotion measure. We have added these details to the methods.

      Line 124: "Coinciding with the recovery of adult neurogenesis, GFAP-TK animals regained the ability to discriminate between their mother and a novel mother". Explain how the difference in interaction time can be interpreted as the ability to discriminate. You could also compute the discrimination index used by several other laboratories (difference of interaction normalized by the total interaction time).

      In response to this comment, we describe how the difference in interaction time can be interpreted as the ability to discriminate between novel and familiar mice.

      Line 133: "Targeted CNO infusion in Nestin-Cre:Gi mice enabled the inhibition of GiDREADD+ abGC axon terminals present in CA2." Provide data or references to support this claim. Injection of a dye of comparable size to CNO would help. Otherwise, mention that nearby CA3a could be affected as well.

      We cannot rule out that nearby CA3a was affected by our cannula infusions of CNO into CA2. Furthermore, since dyes likely diffuse at different rates than CNO, we believe that a dye injection would not eliminate this concern completely. Therefore, we have revised the paper to acknowledge the likelihood that the CNO infusion affected parts of CA3 in addition to CA2. We also changed the title to focus more on the CA2 electrophysiological recordings, which we know were obtained only from the CA2.

      Line 150: "When reintroduced to the now familiar adult mouse 6 hours later, after the effects of CNO had largely worn off". Provide data or references supporting this claim.

      In response, we cite articles that show behavioral effects of CNO DREADD activation are returned to baseline 6 hrs later.

      Line 165: "We found that SWR production is increased during social interaction, with more SWRs produced during novel mouse investigation, presumably during encoding social memories, than during familiar mouse investigation, presumably during retrieval of developmental social memories". How does this compare to the results in Oliva et al, Nature 2021?

      The Oliva et al 2021 paper recorded CA2 SWRs during home cage and during post-social stimulus exposure periods of sleep. The timing of the study does not coincide with the measures we made, but we cite the paper.

      Line 168: "Inhibition of abGCs in the presence of a social stimulus". How does silencing abGC impact CA2 pyramidal neurons' firing rate?

      The direct answer to this question is unknown because we did not measure single units, but based on studies done in the CA3, it is likely that firing rate in CA2 would increase.

      Line 203: "abGCs possess a time-sensitive ability to support retrieval of developmental social memories." Can you speculate on the function of the cells later on?

      In the revised paper, we speculate about the function of abGCs after they mature and no longer support retrieval of developmental social memories.

      Line 229: "GFAP-TK mice were group housed by genotype". Why not housed them with CD1 littermates?

      We housed these mice according to genotype to avoid having mice with different levels of abGCs (GFAP-TK + VGCV and CD1 + VGCV) living together in social groups. We did this to avoid potential differences that might emerge in social behavior.

      Line 237: "Adult TK, Nestin-cre, and Nestin-cre:Gi offspring underwent a social interaction test in which they directly interacted with the mother". Specify how long was the social interaction time.

      In the revised manuscript, we specify that mice interacted with each social stimulus for 5 minutes.

      Line 240: "After a 1-hour delay spent in the home cage". Were the mice single-housed or with their littermates during this delay?

      In the revised manuscript, we indicate that mice were put back into the home cage with their cagemates during the 1 hr delay period.

      Line 241: "The order of stimulus exposure was counterbalanced in all tests." Can you show some data to confirm that the order of presentation did not impair the interaction? Have you considered using your own version of the classical 3-chamber test in order to assess directly the preference for one or the other female mouse?

      Our data suggest that the order of testing is not responsible for the observed results. Across all experimental groups without an abGC manipulation (i.e., all direct social interaction assays excluding VGCV+ GFAP-TK trials and CNO+ Nestin-cre:Gi trials), ~84.4% of animals demonstrate a social preference for the novel mother over the mother (CD1 + GFAP-TK VGCV- cohort: 28/33; CD1 VGCV+ cohort: 17/17; CD1 and TK recovery cohort: 24/31; Nestin-cre and Nestin-cre:GI 4-6-week-old abGC cohort: 77/95; 10-12-week-old abGC cohort: 49/55; Total = 195/231 mice with an investigation preference for the novel mother). If stimulus presentation order were to bias social investigation preference toward the first stimulus presented, we would expect the percentage of animals demonstrating a social preference for each stimulus to be around 50%, as roughly half the animals were first exposed to the mother with the other half first exposed to the novel mother. The social novelty preference percentage reported above is comparable to percentages we observe in our lab's novel to familiar social interaction experiments, in which all animals are first exposed to a novel conspecific. We have yet to conduct experiments testing adults using the modified 3-chamber assay described in Laham et al., 2021.

      Statistics: The statistical tests used throughout the paper are appropriate but their description is too cursory. Please provide F values and specify the name of the tests used in the figure legends before giving the exact p values.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, mediated by the RNA binding protein IGF2BP2. While the study presents interesting and largely solid evidence, part of the work is incomplete, requiring additional controls to more robustly support the major claims. The work would also benefit from further discussion addressing the apparently contradictory effects of circHIPK3 and STAT3 depletion in cancer progression.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, by showing that it interacts with an RNA binding protein (IGF2BP2) and, by sequestering it, it regulates the expression of hundreds of genes containing a sequence (11-mer motif) in their untranslated regions (3'-UTR). This sequence is also present in circHIPK3, precisely where IGF2BP2 binds. The study further focuses on one specific case, the STAT3 gene, whose mRNA product is downregulated upon circHIPK3 depletion apparently through sequestering IGF2BP2, which otherwise binds to and stabilizes STAT3 mRNA. The study presents mechanistic insight into the interactions, sequence motifs, and stoichiometries of the molecules involved in this new mode of regulation. Altogether, this new mechanism seems to underlie the effects of circHIPK3 in cancer progression.

      Strengths:

      The authors show mechanistic insight into a proposed novel "sponging" function of circHIPK3 which is not mediated by sequestering miRNAs but rather by a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of study. They provide both genome-wide analysis and a specific case (STAT3) that is relevant for cancer progression.

      Weaknesses:

      One of the central conclusions of the manuscript, namely that circHIPK3 sequesters IGF2BP2 and thereby regulates target mRNAs, lacks more direct experimental evidence such as rescue experiments where both species are simultaneously knocked down. CircRNA overexpression lacks a demonstration of circularization efficiencies. There seem to be contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, namely that while circHIPK3 is frequently downregulated in cancer, circHIPK3 downregulation in this study leads to downregulation of STAT3. This does not seem to fit the fact that STAT3 is normally activated in a wide diversity of cancers and is positively associated with cell proliferation. The result is neither consistent with the fact that circHIPK3 expression positively correlates with good clinical outcomes. Overall, the authors have achieved some of their aims but additional controls would be advisable to fully support their conclusions.

      We thank the reviewer for the important and constructive criticism. All the raised points have now been addressed as described below.

      Rescue experiment:

      We have now performed the suggested rescue experiment, exploring the potential normalization of target expression upon double knockdown (both circHIPK3 and IGF2BP2). Expression of targets STAT3, NEU and TRAPPC9 were assessed, and all target mRNAs became normalized upon double knockdown, supporting our suggested IGF2BP2 sponging mechanism for circHIPK3. These results have been included in Supplementary Figure 5F.

      Circularization efficiency of ectopically expressed circRNAs:

      For efficient expression of circRNAs in human cells, we have used a state-of-the-art plasmid construct (Laccase2-circRNA; Kramer et al., 2015, Genes Dev. 2015 Oct 15;29(20):2168-82. doi: 10.1101/gad.270421.115), which has proved superior to many alternatives presented in the literature. To ensure proper circularization efficiency of circHIPK3, we have now subjected purified RNA from transfected HEK293 cells (and from HEK293 Flp-In T-Rex cells with stable integration of cassette) to northern blotting (Supplementary Figure S5H). This demonstrates the production of a single RNase R resistant band of correct size, for both circHIPK3 expression constructs. Due to relatively weak signal to noise ratio (rRNA background), we are unable to calculate an accurate linear-to-circ ratio. Nevertheless, the results suggest efficient production of WT and mutant circHIPK3 using the Laccase2 vector system.

      circHIPK3 and STAT3 expression in cancer:

      It is correct that STAT3 expression is oden positively correlated with disease progression in many patients suffering from different cancers, and that the observed expression pattern with downregulation of circHIPK3 and STAT3 in BC cells can be perceived as counterintuitive. We note that the STAT3 profile in our time-course knockdown experiments is somewhat dynamic. While downregulation of STAT3 is most pronounced After 24 hrs of circHIPK3 knockdown, the expression tends to be more normalized After 48 and 72 hrs, which could be due to initiating compensatory mechanisms elicited by the cells. Indeed, comparing long-term development of tumors in patients, with numerous primary and accumulating secondary effects, to transient (0-72 hrs) geneexpression analyses has limitations. In addition, despite the oncogenic role of STAT3 having been widely demonstrated, evidence suggest that STAT3 functions are multifaced and not always trivial to classify. Recent evidence has shown that STAT3 can have opposite functions in cancer and act as both a potent tumor promoter and a tumor suppressor (reviewed in Tolomeo and Cascio, 2021, Int J Mol Sci. 2021 Jan; 22(2): 603. doi: 10.3390/ijms22020603). We have now discussed this in more detail (in the discussion section) and stated some of the limitations of our study in terms of the regulation of the STAT3/p53 axis.

      Reviewer #2 (Public Review):

      The manuscript by Okholm and colleagues identified an interesting new instance of ceRNA involving a circular RNA. The data are clearly presented and support the conclusions. Quantification of the copy number of circRNA and quantification of the protein were performed, and this is important to support the ceRNA mechanism.

      We thank the reviewer for the positive feedback.

      Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking it down and performing an RNA-seq analysis, the authors found thousands of deregulated genes that look unaffected by miRNAs sponging function and that are, instead, enriched for an 11mer motif. Further investigations showed that the 11-mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD (resulting in downregulation and upregulation, respectively) the authors found the STAT3 gene. This was accompanied by consistent concomitant upregulation of one of its targets, TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation.

      Strengths:

      The number of circRNAs continues to drastically grow; however, the field lacks detailed molecular investigations. The presented work critically addresses some of the major pi‘alls in the field of circRNAs and there has been a careful analysis of aspects frequently poorly investigated. The timepoint KD followed by RNA-seq, investigation of the miRNAs-sponge function of circHIPK3, identification of 11-mer motif, identification, and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action have been extensively explored and, comprehensively are convincing.

      Weaknesses:

      In some parts, the manuscript lacks appropriate internal controls (eg: comparison with normal bladder cells, linear transcript measurements upon the KD, RIP internal controls/ WB analysis, etc), statistical analysis and significance (in some qPCRs), exhaustive description in the methods of microscopy and image analysis, western blot, and a separate section of cell lines used. The use of certain cell lines bladder cancer cells vs non-bladder cells in some experiments for the purpose of the study is also unclear.

      Overall, the presented study adds new knowledge in describing circHIPK3 function, its capability to regulate some downstream genes and its interaction and competition for IGF2BP2. However, whereas the experimental part appears technically logical, it remains unclear the overall goal of this study and the final conclusions. The mechanism of condensation proposed, although interesting and encouraging, would need further experimental support and information, especially in the context of cancer.

      In summary, this study is a promising step forward in the comprehension of the functional role of circHIPK3. These data could possibly help to better understand the circHIPK3 role in cancer.

      We thank the reviewer for the important and constructive criticism. All the raised points have now been addressed as described below.

      Internal controls/description of methods:

      We have now included suggested internal controls and provided statistical significance measures where needed. We have also described in more detail the usage of different cell lines for different experiments and a comprehensive description of microscopy, image, and western analyses.<br /> The condensation mechanism of circHIPK3 and IGF2BP2 that we propose has been toned down slightly in the discussion, as we agree that these observations are not unequivocal and could potentially be explained by alternative and yet undefined events as discussed in further detail.

      Recommendations for the authors:

      Major points

      (1) In Figure 1B the authors show neither error bars nor statistical analysis. Did they sequence each cell line in single replicates? A clarification on this point would be of help.

      All timepoints for J82 and UMUC3 were sequenced in biological triplicates (Figure 1C-G). The data shown in Figure 1B represents prior single RNA-seq runs of all specific cell lines sequenced for selection of appropriate BC cell lines used for further study.

      (2) In Figure 1C the quantification of the cognate linear Hipk3 RNA would be desired in order to rule out changes in this species levels that could account for the observed effects upon circHIPK3 KD.

      We do not observe a non-specific downregulation of the HIPK3 mRNA upon circHIPK3 knockdown, rather we observe a moderate upregulation at later timepoints. However, western blotting shows that this upregulation is not translated into significantly increased protein levels. This data is now available in Supplementary Figure S1A and S1B.

      (3) In Supplementary Figure S1B the authors show the number of differentially expressed genes between time points and baseline upon circHIPK3 KD or scr siRNA transfection. However, in this referee's opinion, the relevant comparison would be the differentially expressed genes between circHIPK3 KD and scr siRNA at different time points. Otherwise, they would be focusing on both circHIPK3-specific and non-specific effects.

      The requested comparison is part of the main figures (Figure 1F). The plotted data in Supplementary Figure 1B (Supplementary Figure S1D in the revised version) was included to allow the reviewer to better assess the variability in the data. We therefore believe it provides relevant information and that it should be kept in the final version.

      (4) Figure 1E. How many hours of KD do these measurements correspond to? Even if they correspond to 72 h, there seems to be a discrepancy between Fig 1E and 1F in terms of the total number of differentially expressed (DE) genes. Why are there more DE genes in 1E?

      The number of differentially expressed genes in Figure 1E represents the total number at all timepoints, while Figure 1F represent single timepoints. We have modified the figure legend to clarify this issue.

      (5) In Figure 3B, in order to verify pulldown efficiency, RT-qPCR should be performed instead of endpoint RT-PCR. Otherwise, no robust claim can be made regarding interaction affinities.

      We agree that these RIP-PCR results in Figure 3B are only semi-quantitative and therefore do not unequivocally assess binding strength. However, since IGF2BP2 is the RNA binding protein in focus throughout the rest of the study, where additional quantitative RIP-RT-qPCR experiments have been performed, we find this issue negligible. In addition, the semi-quantitative nature of the endpoint PCR experiment has now been mentioned in the main text and figure legend.

      (6) The authors claim that IGF2BP2 KD counteracts the effect of circHIPK3 KD on target mRNAs. However, in order to support this claim the authors should perform a rescue experiment where they simultaneously knock down both circHIPK3 and IGF2BP2. Otherwise, the conclusion remains largely supported by a correlation.

      Indeed, such an experiment is important. A rescue experiment with double knockdown has now been performed and demonstrates that levels of tested targets; STAT3, NEU and TRAPPC9 become normalized under these conditions, supporting our IGF2BP2/circHIPK3 sponging model. The data is available in Supplementary Figure S5F.

      (7) The authors claim that circHIPK3 interacts strongly with IGF2BP2 in bladder cancer cells but not with GRWD1. This is shown in Figure 4A where neither standard errors nor statistical analysis is shown. The authors need to show replicates of this experiment and perform statistics in order to support their claims.

      These experiments have been redone with even higher stringency in biological triplicates and fully supports our claims. The data is available in a modified Figure 4A – now including error bars and indications of significance. In addition, we have included western blots demonstrating Input (IN), Flowthrough (FT) and Immunoprecipitation (IP) of correctly sized proteins in Supplementary Figure S4A.

      (8) The authors claim that the STAT3 gene, which contains the 11-mer motif in its 3'UTR, becomes downregulated upon circHIPK3 KD in UMUC3 and J82 cells, while it is upregulated upon IGF2BP2 depletion in both cell lines. It is unclear why they show the effect of circHIPK3 KD on STAT3 within a time course while the effect of IGF2BP2 KD in a fixed time point (Figures 5A/S5A and 5B/S5B respectively), and it would be convenient to clarify this point.

      The initial time course knockdown experiment for circHIPK3 was conducted to provide a comprehensive dataset for circHIPK3-mediated events and clarify any temporal effects. After identification of IGF2BP2 as an interaction partner of circHIPK3, we chose to harvest cells After knockdown at 48 hrs as knockdown efficiency was prominent at this point. The temporal knockdown efficiency of RNAs (circHIPK3) and proteins (IGF2BP2) differ considerably due to increased stability of proteins compared to target RNA. This is the main reason why only a single timepoint has been assessed.

      (9) In Figure 5F the authors show that upon overexpression of wildtype or 11-mer motif-mutant circHIPK3, the binding of IGF2BP2 was reduced while the binding of STAT3 mRNA to IGF2BP2 was increased. In order to rule out differences in circularization efficiencies, it would be convenient to show a northern blot comparing the efficiency of circHIPK3 overexpression relative to its linear cognate RNA for both constructs.

      Indeed, circRNA expression constructs may differ considerably in circularization efficiencies. We are using the Laccase2 system developed by the Jeremy Wilusz lab (Kramer et al., 2015), which, at least in our hands, efficiently produces circRNAs from almost any inserted sequence. To address whether the WT and mutant circHIPK3 express similar amounts of circRNA with high efficiency, we performed the suggested northern blot, which displays very similar RNase R resistant circHIPK3 levels. The data is now available in Supplementary Figure S5H. Due to background signal from 18S rRNA in non-RNase R treated samples, we cannot accurately calculate a linear/circular RNA ratio, since no distinct linear RNA species above background is visible on the blot. However, the important part that mutant and WT (RNase R resistant) circRNA are expressed at similar levels, makes us confident about our conclusion that WT circHIPK3 expression interferes with IGF2BP2 binding to STAT3 mRNA.

      (10) Figure 1G, several genes were selected as up and downregulated for J82 and UMUc3 cell lines. Were these consistently involved in specific biological processes?

      Genes were classified as down or upregulated based on significant (FDR<0.1) fold changes. The most significant genes in both directions were named, disregarding of involvement in any specific biological processes. Initially, we performed a GO-term analysis on these genes and received many hits, but we did not observe a very specific pattern or cluster of genes, suggesting that we are looking at both primary and secondary effects of knocking down circHIPK3. We believe our GSEA of the 50 hallmarks of cancer genes sets, presented in Figure 4D, 4E and Supplementary Figure S4E and S4F is addressing this point in a satisfactory manner.

      (11) For differential expression analysis, which data sets were used to group outcomes at different time points. Also, there is an increased number of genes affected after KD - please describe in more detail how you reached that gene number.

      As also discussed above (point 3), at each timepoint (Figure 1F) “Scr” was compared to “circHIPK3” knockdown. It makes sense that more and more genes are DE over the course of time as both primary and secondary effects of knockdown will build up over time. We have now clarified which datasets have been used in the figure legend and rewritten the Methods’ section on differential expression analysis.

      (12) What happens with the expression of circHIPK3 if STAT3 is KD? What biological processes are modulated by silencing circHIPK3?

      (13) What happens in bladder cancer cells if STAT3 and circHIPK3 are KD?

      The main goal of our work is to clarify how circRNAs (here circHIPK3) affect gene-expression and cancer pathways. While it would be interesting to explore the consequences of STAT3 knockdown and in combination with circHIPK3, such experiments would require comprehensive additional analyses (RNA-seq), which we believe is beyond the scope of this study at this point.

      (14) The rationale of the study and conclusions are unclear. Quote "we extensively evaluate the functional impact of circHIPK3 in bladder cancer cells". As previously published by the authors, as well as mentioned in the manuscript, circHIPK3 is downregulated in cancers and possesses tumor suppressor functions in bladder cancers. Could the authors clarify how the results of the presented study based on the depletion of circHIPK3 fit with the previous discoveries? If the circHIPK3 is generally downregulated compared to normal cells (although higher compared to the linear transcript) why do the authors use a KD approach? Are the bladder cancer cells simply a cell model to study circRNA vs linear? How the condensation model reconciles with circHIPK3 tumor suppressor function based on these results?

      We believe that it remains unclear whether circHIPK3 is a direct tumor suppressor, although this is possible judged from the clinical patient data, since STAT3, which has been shown to become activated in many cancers, is also downregulated upon circHIPK3 knockdown. However, differences in immediate effects on gene-expression of circHIPK3 knockdown (0-72 hrs) and long-term development of tumors within patients, may be difficult to compare directly. If STAT3 downregulation contributes to cancer phenotypes in bladder cancer as suggested for several other cancer types (Glioblastoma, prostate cancer, lung cancer etc.) circHIPK3 may indeed still be classified as a tumor suppressor in bladder cancer. It is worth noting that circHIPK3 has been shown to be upregulated and have oncogenic phenotypes in many other cancers, which makes direct correlations between cancers complex and difficult to reconcile. We have revised the discussion to reflect these issues in a more comprehensive fashion. To fully delve into STAT3 regulation in terms of bladder cancer development, progression, cell invasiveness, and survival, we believe are more suitable for future experiments.

      At this point, we have identified a novel mechanism of a circRNA deregulated in cancer being able to sponge/regulate the function of an oncogenic RNA binding protein, even though it is severely outnumbered in cells. Importantly, circHIPK3 likely does not function as a miRNA sponge as previously proposed in several previous studies based on circRNA overexpression, reporter constructs and miRNA mimics. We therefore believe that these findings provide new important insights into circHIPK3 function and that the current understanding of circRNAs functioning primarily as miRNA sponges, likely should be revised.

      (15) Related to the previous point, if the purpose is to study the role of circHIPK3 in bladder cancer, there is a bit of a lack of consistency and it is sometimes confusing to understand the use of certain cell lines for specific experiments. The initial circHIPK3 KD experiments have been conducted in 2 (out of 11 not malignant/ metastatic) bladder cancer cell lines (J82 and UMUC3). Why this specific selection of exclusively metastatic bladder cell lines? For comparison are the normal bladder cell lines characterized by the same circRNA vs linear ratio?

      The selection of bladder cancer cell lines (J82, UMUC3 and FL3) is based on several criteria including expression levels of circHIPK3, cell maintenance characteristics and knockdown/transfection efficiencies. Initially, we included HT1197 cells as well, but batch effects precluded the use of these data.

      Furthermore, the subsequent miRNA analysis was conducted exclusively in one bladder cell line (J82 but not in UMUC3), the initial identification of motif again in bladder cells but the initial RBP identification and experimental interaction is conducted in non-bladder cells HepG2 and k562 (reported as main figure 3B) and only subsequently in bladder cell (4A), again in a different cell line (only FL3, but not in J82 and UMUC3). The validation of the interaction of STAT3 by RIP is performed exclusively in FL3. All this also makes someone wonder how specific this mechanism/binding is in bladder cancer cells. There is an attempt to explain this by comparing cell cycle progression analysis upon circHIPK3 KD and IGF2BP2 KD later on but the final conclusions of this analysis remain unclear. The authors should provide more explanation and information in this part of the manuscript.

      It is correct that the different bladder cancer cell lines (FL3, J82 and UMUC3) have been used more or less interchangeably between experiments. This is due to the observed common phenotypes, e.g. sharing up to 92% DE genes, and highly significant enrichment of the IGF2BP2 11-mer-motif in downregulated mRNAs upon circHIK3 knockdown in all three cell lines. The ENCODE cell lines HepG2 and K562 were used since the accessible RBP-CLIP data originates from the ENCODE project, where these cells have been used exclusively. Hence, we validated the binding of candidate RBPs (semi-quantitatively) in HepG2 and K562 prior to assessing their RNA binding in the BC cell line FL3. We have used FL3 for RIP and validation of IGF2BP2 binding mainly due to better transfection efficiency and higher expression levels, allowing detection all interrogated components. The fact that we have included three BC cell lines in many experiments instead of only one, and obtained consistent results, solidifies the conclusions that our phenotypes and regulatory mechanisms are likely common for most, if not all, bladder cancer cell lines. We have included a paragraph in the materials and methods section to further clarify the usage of cell lines in the different experiments.

      (16) STAT3 gene is used as an example. Where is this gene coming from? How has this gene been selected? Is there any complete list of RNA-seq data of up/down-regulated genes upon circHIPK3 KD? The raw data and gene list should be publicly available to the reviewers.

      STAT3 is a major regulator of cancer pathways and therefore an interesting candidate for further analysis as it is differentially expressed between control and circHIPK3 knockdown in all cell lines. We have now included the complete list of DE genes from the time-resolved RNA-seq analyses (DESeq2 output files) in the supplementary material. This data is now available in Supplementary Tables S6 and S7.

      (17) In performing the KD of circHIPK3 the authors use a unique siRNA on a splice junction. The authors claim that this is a way to not affect the linear transcript, however, have the authors also ensured experimentally that this doesn't affect in any way the linear RNA? This should be included as an initial internal control.

      We do not observe a downregulation of the HIPK3 mRNA upon circHIPK3 downregulation, rather we observe a moderate upregulation at later timepoints. When assessing the HIPK3 protein levels, we observe no significant change After 48 hrs of knockdown. This data is now available in Supplementary Figure S1A and S1B.

      (18) Additional controls should be provided for RIP, especially for Fig3B and 4A, Sfig4, 5C such as an internal positive control (es: AGAP2-AS1) of the correct pulldown of IGF2BP2 and/or WB should be shown (in the methods it is told that WB has been used for the analysis of RIP but I couldn't find any)

      Indeed, IGF2BP2 likely binds to many mRNAs in the cell. We have now included b-actin mRNA as a low affinity control in the Figure 4A RIP data, showing that circHIPK3 represents a tight binding substrate for IGF2BP2. We have also included a western blot showing the IP of IGF2BP2, IGF2BP2, GRWD1 and GFP. This data is now available in Supplementary Figure S4A.

      (19) Additional internal experimental controls should be included to assess the successful transfection and overexpression of circHIPK3 with the laccase-2 driven plasmid and mutated versions before the RIP in 4B and in the 5F. Supportive controls to show equal transfection would be required for Figure 6C-D. Further controls to show that the ASO specifically targets the 11-mer in circHIPK3 but not IGF2BP2 target genes should also be included. Please include this information in the supplementary materials.

      We have now included a northern blot showing successful transfection and expression of RNase R resistant circHIPK3 from the Laccase2 vector (WT and mutant) in relation to RIP experiments. This data is now available in Supplementary Figure S5H (see also comments about this above). Equal transfections in cells shown in Figure 6C-D is assessed by comparable levels of GFP expression, which is included as an expression cassette in the modified Laccase2 construct. Pictures were acquired with same exposure time and scaling to ensure that they can be compared directly. The ASO targets circHIPK3 with full complementarity, while STAT3 mRNA has 2 mismatches, leaving the “lesser interaction” with STAT3 theoretical. This has now been clarified in the main text.

      (20) Specifically, in 1C and 4A, Sfig4 there is no statistical analysis made and/or significance? This is only reported for the RIP experiment in Fig 5C.

      Statistical analyses have now been performed and shown in Figure 4A and we have included binding of ACTB as a low affinity control. In Figure 1C, which displays knockdown efficiency (highly efficient) at the various timepoints, no statistical significance has been displayed, since this is normally not done for such knockdown experiments. In addition, it is also not clear which comparisons would be beneficial. Except for the J82 cell line at 12 hrs compared to 0 hrs, knockdown efficiency is high and statistically significant at all timepoints.

      (21) In the assessment of copy number ensuring the same primer efficiency is fundamental, it can't be simply "assumed". Please clarify this point and possibly include this information in the supplementary materials.

      It is correct that identical, or at least very similar, primer efficiencies are necessary to make the conclusion that the relationship between GAPDH mRNA and circHIPK3 levels in the cell reflects the quantitatively measured number of molecules. However, since this single comment is only to support the quantitatively measured circHIPK3 molecules by a ballpark estimate, and since we already assume that there are an estimated 10.000-20.000 copies of GAPDH mRNAs in most cells (which we also do not know precisely), we have chosen to remove this statement.

      (22) The methodology section is not well organized and looks incomplete. For example, there are two separate sections for circHIPK3 expression conducted in different cell lines, this would be better explained in a single paragraph.

      We have now rewritten this section to make it clearer.

      The section reporting cell lines and growth conditions is incorporated in "circHIPK3 KD and overexpression" while it should be a separate paragraph and valid for all experiments where these cells have been used. There is no information regarding Western blots, including Antibodies used, and densitometry performed.

      This information has now been included.

      In "immunofluorescence microscopy" it is not clear what microscope has been used, how many acquisitions have been made, and how acquisition has been performed. Related to this, how the image analysis has been performed? Figures 5I-J "Finally, immunofluorescence staining showed that nuclear and overall STAT3 protein levels are significantly lower upon circHIPK3 KD, while nuclear p53 protein levels are higher" and 6C and D "we observed a significantly higher prevalence of large cytoplasmic condensates in cells expressing high levels of circHIPK3 compared to controls" how this quantification has been made? The conclusive part about the condensation role remains a bit too loose and mostly speculative, largely due to the lack of robust information provided on microscopy and image analysis

      We have now included a better description of the acquisition and quantification methods.

      Minor

      (1) The Van Nostrand et al 2018 citation should refer to the updated publication in Nature and not to the original preprint in Biorxiv.

      This reference has now been updated.

      (2) In Supplementary Figure S3B, the authors offer no explanation as to why genes that become upregulated upon circHIPK3 knockdown generally contain more circHIPK3-RBP binding sites other than for IGF2BP2. A clarification would be of help.

      We do not have any evidence to explain this observation. One possibility is that other RBPs elicit mRNA stabilizing effects on average, whereas abundant IGF2BP2 (~ 120.000-200.000 copies per cell) now able to bind more target mRNAs and elicit destabilization. This remains highly speculative though.

      (3) In Supplementary Figure S3D, the authors' claim that the 11-mer motif is found more bound to IGF2BP2 than for other circHIPK3-RBPs should be referred to the corresponding dataset/reference.

      This information is stated in the figure legend (K562) and we have now included it in the main text as well: “We evaluated how oden binding sites of circHIPK3-RBPs overlap the 11-mer motif and found that this is more oden the case for IGF2BP2 binding sites than binding sites of the other circHIPK3-RBPs when scrutinizing K562 datasets (Supplementary Figure S3D)”.

      (4) In Figure 4C the authors show that, according to previously performed experiments of their group, the 11-mer motif is enriched in upregulated genes compared to downregulated genes upon IGF2BP2 KD in UMUC3. This seems like a confirmation of the results presented in the preceding section (Figure 3H) and it would be clearer if it were presented in the same section.

      The data in Figure 3H is based on ENCODE data from IGF2BP2 knockdowns in K562 cells, while in Figure 4C these are from IGF2BP2 knockdown followed by sequencing in UMUC3 cells. We believe the timing of the data is fitting as is, since they relate to non-BC cells and BC cells, respectively.

      (5) More in vitro experiments are needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype, and how different cancer hallmarks are modulated by this ceRNA network.

      We agree that this study does not fully clarify how these complex molecular interactions relate to bladder cancer progression, including fluctuations of key cancer genes/proteins. Since our focus has been on the mechanisms of circRNA function in relation to bladder cancer, these issues will await further future experimentation.

      (6) "apparent" competition (introduction - pag4)? Maybe rephrase more appropriately.

      This has been rephrased and “apparent” excluded.´

      (7) Fig1C. Relative quantification. Statistical analysis? Is this significant?

      See also comment to point 20 above. In Figure 1C we show the knockdown efficiency at the different timepoints. At all timepoints knockdowns are highly significant compared to the control (Scr), which is not significantly changed over time. It seems somewhat redundant to include pvalues for such data. Also, which comparisons should be highlighted? Knockdown is highly efficient, which is what we want to show.

      (8) Figure 5H. Western blot. Densitometry quantification performed, how?

      This is now described in the Materials and Methods section.

      (9) Please specify the concentration of circHIPK3-specific siRNA used.

      20 nM. The information is included in the Materials and Methods section.

      (10) The control sample refers to scrambled or untreated cells? Instead of using "control samples without siRNA transfection" or "No siRNA" use untreated cells - otherwise, it is a bit confusing.

      This has now been modified.

      (11) Figure 3 is starting with hepatocellular and leukemia cells; why not with bladder cells?

      These experiments were performed based on CLIP-data and RBP knockdown data from the ENCODE project. The cells used are limited to HepG2 and K562.

      (12) For Figure 4B, which is the time-point?

      This is 24 hrs. Has now been stated.

      (13) Figure 5I and J, the expression of STAT3 and circHIPK3 can be also investigated for cellular distribution.

      The expression of STAT3 is investigated in Figure 5I. Localization of circRNA by standard RNA-FISH protocols using multiple (>20) probes is inherently difficult due to the cross reaction of probes with the linear mRNA. Certain amplification steps can be included if using a single backsplicing junction probe, but this is oden giving rise to highly ambiguous results as specificity is very limited due to the “one probe“ nature of the design.

      (14) Some discussion of the limitations of the study would be of value.

      We have included this in the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published).

      Strengths:

      The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      (1) The data presented are largely descriptive in terms of the effect of PiCo activation on the probability of swallowing and the pattern of motor activation changes following CIH. Comparisons made between experimental data acquired currently and those obtained in a previous cohort of animals (possibly years before) are extremely problematic, with the potential confounding influence of changing environments, genetics, and litter effects. The statistical analyses (i.e. comparing CIH with normoxic) appear insufficiently robust. Exactly how the data were compared is not described.

      Yes, we agree the data are descriptive in terms of characterizing the effect of CIH on PiCo activation. However, we would like to emphasize that the data are also mechanistic because they characterize the effects of specifically, optogenetically manipulating PiCo neurons after being exposed to CIH.

      Thank you for this comment and for pointing out our misleading description in the paper. This manuscript is meant to independently characterize the effects of CIH to the response of PiCo stimulation. We are not making direct comparisons between the previously published manuscript where mice were exposed to room air. There has been no statistical analysis made between previously published control and current CIH data, since we are not making a direct comparison, only an observational comparison.

      To make this clearer, and to address the reviewers concern, we have removed the room air data from figures 1E, 2C and 3A. However, we believe it is important to keep the data from mice exposed to room air in Figure 2B since we did not include this information in the previously published manuscript. It is important to point out that all mice exposed to CIH have some form of submental activity during laryngeal activation in response to PiCo stimulation. This is not the case when mice are exposed to room air only. In this figure, only descriptive analysis are presented. We adjusted our wording throughout the text, particularly in the discussion, to eliminate any confusion that we are making direct comparisons between the two studies. The following sentence has been added to the discussion “While we do not intend to make direct quantitative comparisons between the previously published PiCo-triggered swallows in control mice exposed to room air (Huff et al 2023) and the data presented here for mice exposed to CIH, we believe it is important to compare the conclusions made in these two studies.” This was the motivation for using the eLife Advance format. Since the present study demonstrates that PiCo affects swallow patterning which was not observed in the control data.

      (2) There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. For example, does CIH alter PiCo directly, or some other component of the circuit (NTS)? Techniques that silence or activation projections to/from PiCo should be interrogated. This is required to further delineate and define the swallowing circuit, which remains enigmatic.

      We agree with the reviewer that our study raises many more questions than we are able to answer at the moment. This however applies to most scientific studies. Even though swallowing has been studied for many decades, the underlying circuitry remains largely enigmatic. We will continue to investigate the role of PiCo and its interaction with the NTS, in healthy and diseased states. These investigations require many different techniques, and approaches, some of which are still in development. For example, we are currently conducting experiments that silence portions of the NTS related to swallow and PiCo: ChAT/Vglut2 neurons using novel unpublished viral approaches. However, these are separate and ongoing studies beyond the scope of the current one.

      To address the reviewer’s comment, we have added to the following to the limitation section: “In addition, this preparation does not allow for recording of PiCo neurons to evaluate the direct effects of CIH in PiCo neuronal activity”. The following has also been added to the discussion: “Rather, our data reveal CIH disrupts the swallow motor sequence which is likely due to changes in the interaction between PiCo and the SPG, presumably located in the cNTS. While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow motor patterning itself. Here we show for the first time that CIH leads to disturbances in the generation of the swallow motor pattern that is activated by stimulating PiCo. This suggests that PiCo is not only important for coordinating swallow and breathing, but also modulating swallow motor patterning. Further studies are necessary to directly evaluate the presumed interactions between PiCo and the cNTS.”

      (3) The functional significance of the altered (non-classic) patterns is unclear.

      Like in our original study, the preparation used to stimulate PiCo does not allow to simultaneously characterize the functional significance of swallowing. Therefore, we have included this as a limitation in the limitation section: “In this preparation we are unable to directly determine the functionality of the variable swallow motor pattern seen after CIH. Different experimental techniques, such as videofluoroscopy would need to be used to directly evaluate functional significance. This technique is beyond the scope of this study and not possible to perform in this preparation. We acknowledge this limits our ability to make direct comparisons between dysphagic swallows in OSA patients.”

      Reviewer #1 (Recommendations For The Authors):

      (1) A more rigorous experimental approach is required. Littermates should be separated and exposed to either room air or CIH at the same (or close to the same) time.

      As stated above, we did not directly compare mice exposed to room air with mice exposed to CIH. Hence, we believe this is not necessary, and it would have meant repeating all the experiments already published in the original eLife paper.

      (2) Robust statistical analyses are required to determine whether the effects of CIH on the pattern/probability of motor activation are required.

      Since control and CIH group were not compared in this study, statistical hypothesis testing is not appropriate or applicable.

      (3) Use a combination of retrograde, Cre- AAVs and Cre-dependent approaches to interrogate the circuitry to/from PiCO that forms the swallowing network. This is what is needed to push this area forward, in my view.

      Thank you for this suggestion, we will consider this suggestion as we plan for future experiments. Indeed, we are in the process of developing novel approaches. However, in this context we would like to emphasize that further network investigations are exponentially more complicated given that we need to use a Flpo/Cre approach to specifically characterize the glutamatergic-cholinergic PiCo neurons. Most other laboratories that have studied PiCo have avoided this experimental complication and used only a “cre-dependent” approach. This approach is much simpler, but the data are much less specific and the conclusions sometimes misleading. Stimulating for example cholinergic neurons in the PiCo area will also activate Nucleus ambiguus neurons, stimulating glutamatergic neurons will also activate glutamatergic neurons that are not necessarily the glutamatergic/cholinergic neurons that we use to define PiCo specifically. Readers that are unfamiliar with these different approaches often miss this important difference. Hence, compared to stimulating other areas, stimulating the cholinergic-glutamatergic neurons in PiCo is much more specific than e.g. stimulating preBötzinger complex neurons. There are no markers that will specifically stimulate only preBötzinger complex neurons or neurons in the parafacial Nucleus. Unfortunately, this difference is often overlooked.

      (4) It should be made more clear how each of the "non-classic" swallowing patterns could cause dysfunction - especially to the reader who is not completely familiar with the neural control of swallowing.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since our approach does not allow us to use any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not speculated on the functional implications. We have added the following to the discussion section of this manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns. ”

      Minor:

      The Results should be written in a way that better conveys the neurophysiological effects of the manipulations. As it stands, it reads like a statistical report on how activation of each neuronal phenotype is statistically different from each other. As such it is difficult to read and understand the salient findings.

      Thank you for this insight. We have adjusted the language in the results section.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the role of a medullary region, named Postinspiratory Complex (PiCo), in the mediation of swallow/laryngeal behaviours, their coordination with breathing, and the possible impact on the reflex exerted by chronic intermittent hypoxia (CIH). This region is characterized by the presence of glutamatergic/cholinergic interneurons. Thus, experiments have been performed in single allelic and intersectional allelic recombinase transgenic mice to specifically excite cholinergic/glutamatergic neurons using optogenetic techniques, while recording from relevant muscles involved in swallowing and laryngeal activation. The data indicate that in anaesthetized transgenic mice exposed to CIH, the optogenetic activation of PiCo neurons triggers swallow activity characterized by variable motor patterns. In addition, these animals show an increased probability of triggering a swallow when stimulation is applied during the first part of the respiratory cycle. They conclude that the PiCo region may be involved in the occurrence of swallow and other laryngeal behaviours. These data interestingly improve the ongoing discussion on neural pathways involved in swallow-breathing coordination, with specific attention to factors leading to disruption that may contribute to dysphagia under some pathological conditions.

      The Authors' conclusions are partially justified by their data. However, it should be acknowledged that the impact of the study is to a certain extent limited by the lack of knowledge on the source of excitatory inputs to PiCo during swallowing under physiological conditions, i.e. during water-evoked swallowing. Also the connectivity between this region and the swallowing CPG, a structure not well defined, or other brain regions involved in the reflex is not known.

      We thank the reviewer for the comments and the strength of the paper. However, with regards to the “lack of knowledge”, we would like to emphasize that PiCo was first described in 2016, while e.g. the preBötzinger complex was described in 1991. Thus, it is not fair to assume the same level of anatomical and physiological understanding for PiCo as we became accustomed to for the preBötzinger complex. We are fairly confident that in 25 years from now, our knowledge of the in- and outputs of PiCo will be much less limited than it currently is.

      Strengths:

      Major strengths of the manuscript:

      • The methodological approach is refined and well-suited for the experimental question. The in vivo mouse preparation developed for this study takes advantage of selective optogenetic stimulation of specific cell types with the simultaneous EMG recordings from upper airway muscles involved in respiration and swallowing to assess their motor patterns. The animal model and the chronic intermittent hypoxia protocol have already been published in previous papers (Huff et al. 2022, 2023).

      • The choice of the topic. Swallow disruption may contribute to the dysphagia under some pathological conditions, such as obstructive sleep apnea. Investigations aimed at exploring and clarifying neural structures involved in this behaviour as well as the connectivity underpinning muscle coordination are needed.

      • This study fits in with previous works. This work is a logical extension of previous studies from this group on swallowing-breathing coordination with further advances using a mouse model for obstructive sleep apnea.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      Major weaknesses of the manuscript:

      • The Authors should be more cautious in concluding that the PiCo is critical for the generation of swallowing itself. It remains to demonstrate that PiCo is necessary for swallowing and laryngeal function in a more physiological situation, i.e. swallow of a bolus of water or food. It should be interesting to investigate the effects of silencing PiCo cholinergic/glutamatergic neurons on normal swallowing. In this perspective, the title should be slightly modified to avoid "swallow pattern generation" (e.g. Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production).

      Thank you for pointing out that this manuscript suggest PiCo is necessary for swallow generation. We agree further interventions to silence specifically PiCo ChAt/Vglut2 neurons will be necessary to investigate this claim. Which we have begun to evaluate for a future study by developing a novel as yet unpublished approach. We have altered language throughout the text to limit the perception that PiCo is the swallow pattern generator. We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

      • The duration of swallows evoked by optogenetic stimulation of PiCo is considerably shorter in comparison with the duration of swallows evoked by a physiological stimulus (water). This makes it hard to compare the timing and the pattern of motor response in CIH-exposed mice. In Figure 1, the trace time scale should be the same for water-triggered and PiCo-triggered swallows. In addition, it is not clear if exposure to CIH alters the ongoing respiratory activity. Is the respiratory rhythm altered by hypoxia? If a disturbed or irregular pattern of breathing is already present in CIH-exposed mice, could this alteration interfere with the swallowing behaviour?

      Thank you. We have changed the time scale so that all representative traces are on the same time scale.

      We explained in the original paper (Huff et al 2023) that the significant decrease in PiCo-evoked swallow duration compared to water evoked is likely due to the absence of oral/upper airway feedback. We are not making comparisons of the effects of CIH on swallow motor pattern between water-evoked and PiCo-evoked. Rather, we are only characterizing the effects of CIH on the swallow motor pattern in PiCo-evoked swallows. The purpose of Figure 1A is to show that the rostocaudal submental-laryngeal sequence in water-evoked swallows is preserved in “canonical” PiCo-evoked swallow like is shown in the original study. While we did not measure the effects of CIH on breathing and the respiratory pattern in this study, it has been established, by others, that CIH causes respiratory muscle weakness, impaired motor control of the upper airway and variable respiratory rhythm and rhythm generation. However, when characterizing the timing of swallow in relation to inspiration (Figure 1 Figure Supplement 1) and the reset of the respiratory rhythm (Figure 3 figure supplement 1) and by observationally comparing these results with mice exposed to room air (Huff et al 2023) we do not observe any obvious differences in swallow-breathing coordination. However, a separate study in wild-type mice focusing on a characterization of swallowing via water after CIH would be better suited to achieve a better understanding of the physiological changes of swallowing after CIH. We would like to point out that this has shown in Huff et al 2022 that altering respiratory rate/pattern via activation of various preBötzinger Complex neurons does not change swallow behavior. Except in the case of Dbx1 PreBötC neuron activation, which was independent of CIH. Increasing or decreasing respiratory rate via activation of PreBötC Vgat and SST neurons did not change the swallow pattern rather it changed the timing of when swallows occurred. It has been reported before by others that swallow has a hierarchical control over breathing and has the ability to shut breathing down. We believe that the swallowing behavior is independent of respiratory pattern and alterations in breathing pattern does not necessarily affect the swallow motor pattern rather could affect the swallow timing.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Lines 37-41 "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly the generation of swallow motor pattern was significantly disturbed."

      It should be better:

      "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly triggers variable swallow motor patterns".

      Thank you, this has been changed

      Lines 41-43 "This suggests, glutamatergic-cholinergic neurons in PiCo are not only critical for the gating of postinspiratory and swallow activity but also play important roles in the generation of swallow motor pattern." I suggest removing any language claiming PiCo is swallow gating and change "generation" in "modulation"

      "This suggests that glutamatergic-cholinergic neurons in PiCo are not only critical in regulating swallow-breathing coordination but also play important roles in the modulation of swallow motor pattern."

      Thank you, this has been changed

      Introduction:

      Line 88-90: Actually, in Huff et al. 2023 it is said "PiCo acts as an interface between the swallow pattern generator and the preBötzinger complex to coordinate swallow and breathing". Please, change accordingly. Please, remove Toor et al., 2019 since their conclusions are quite different.

      Line 100-101: Please, change the sentence according to the comments reported above.

      Thank you, this has been changed

      Results:

      Lines 104-105: Did you mean: "We confirmed that optogenetic stimulation of PiCo neurons in ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH triggers swallow and laryngeal activation similar to the control mice exposed to room air (Huff et al., 2023)." Otherwise, the sentence is not clear.

      Thank you, this has been changed

      Lines 129-130: This finding is not surprising since similar results have been reported in Huff et al. 2023.

      Thank you, we wanted to confirm that CIH did not alter this characteristic, which it did not. We believe that it is important to include this as it is a criterion for characterizing laryngeal activation.

      Lines 219: The number of water swallows is considerably lower than stimulation-evoked swallows. Why?

      We inject water into the mouth three times. Typically, there is one swallow in response to each water injection. Pico is stimulated 25 times at each duration. If we were to stimulate swallow with water as many times as optogenetic stimulation there would be an adaptive response to the water stimulation and the mouse would not respond. This does not seem to be the case with PiCo stimulation. Simple answer is, there are many more PiCo stimulations than water stimulation.

      Lines 228-232: "PiCo-triggered swallows are characterized by a significant decrease in duration compared to swallows evoked by water in ChATcre:Ai32 mice (265 {plus minus} 132ms vs 144 {plus minus} 101ms; paired t-test: p= 0.0001, t= 5.21, df= 8), Vglut2cre:Ai32 mice (308 {plus minus} 184ms vs 125 {plus minus} 44ms; paired t-test: p= 0.0003, t= 6.46, df= 7), and ChATcre:Vglut2FlpO:ChR2 mice (230 {plus minus} 67ms vs 130 {plus minus} 35ms; paired t-test: p= 0.0005, t= 5.62, df= 8) exposed to CIH (Table S1).".

      Thank you, this has been changed

      Line 252 and 254: remove SEM.

      Thank you, this has been changed

      Discussion

      Line 267: ...(Figure 1Bi), while 28% of PiCo-triggered swallows...

      Thank you, this has been changed

      Lines 283-290: "Thus, CIH does not alter PiCo's ability to coordinate the timing for swallowing and breathing. Rather, our data reveals that CIH disrupts the swallow motor sequence likely due to changes in the interaction between PiCo and the SPG, presumably the cNTS.

      While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow pattern generation itself. Thus, here we show for the first time that CIH resulted in the instability of the swallow motor pattern activated by stimulating PiCo, suggesting PiCo plays a role in its modulation.".

      Thank you, this has been changed

      Could the observed effects be due to a non-specific effect of hypoxia on neuronal excitability? In addition, it should be considered that PiCo-triggered swallows lack the behavioural setting of water-evoked swallows and do not activate the sensory component of the SPG to the same extent as the water-evoked swallows.

      Yes, this is very possible. We stated in our first manuscript that the decrease in PiCo-triggered swallow duration, as compared to water-triggered swallow duration, is likely because oral sensory components are not being activated to the same extent (Huff et al. 2023). Since we do not directly measure neuronal excitability, it is not known (in this study) whether CIH causes changes in the excitability to swallow related areas. However, others have shown increased excitability and activity of Vglut2 neurons after CIH exposure (Kline et al 2007,2010), and we have shown e.g. changes in the excitability of preBötC neurons (Garcia et al. 2016, 2017).

      Lines 293-300: The sentence is not clear. Is there any evidence indicating that glutamatergic neurons are differently affected by hypoxia than cholinergic neurons?

      Thank you, these sentences have been changed to increase clarity. The section now reads: There was no statistical difference in the probability of triggering a swallow during optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 neurons in mice exposed to room air (Huff et al 2023). However, when exposed to CIH, ChATcre:Ai32 and Vglut2:Ai32 mice have a lower probability of triggering a swallow -- in some mice swallow was never triggered via PiCo activation, while water-triggered swallows remained – compared to the ChATcre:Vglut2FlpO:ChR2 mice. While it is possible that portions of the presumed SPG remain less affected by CIH, which could offset these instabilities to produce functional swallows, our data suggest that PiCo targets microcircuits within the SPG that are highly affected by CIH. The NTS is a primary first site for upper airway and swallow-related sensory termination in the brainstem (Jean, 1984). CIH induces changes to the cardio-respiratory Vglut2 neurons, resulting in an increase in cNTS neuronal activity (Kline, 2010; Kline et al., 2007), as well as changes to preBötzinger neurons (Garcia et al., 2017; Garcia et al., 2016) and ChAT neurons in the basal forebrain (Tang et al., 2020). It is reasonable to suggests that CIH has differential effects on neurons that only express ChATcre and Vglut2cre versus the PiCo-specific interneurons that co-express ChATcre and Vglut2FlpO, emphasizing the importance of targeting and manipulating these PiCo-specific interneurons.”

      Lines 372-374: "Here we show that PiCo, a neuronal network which is critical for the generation of postinspiratory activity (Andersen et al. 2016) and implicated in the coordination of swallowing and breathing (Huff et al., 2023), is severely affected by CIH.".

      Thank you, this has been changed.

      Methods

      Line 398: Did you mean Slc17a6-IRES2-FlpO-D?

      Thank you, this has been changed.

      Line 399: were.

      Thank you, this has been changed.

      Line 403: ... expressing both ChAT and Vglut2 and will be reported as ChATcre:Vglut2FlpO.

      Thank you, this has been changed.

      Line 437: Mice of the ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 lines were kept in collective cages with food and water ad libitum placed inside custom-built chambers.

      Thank you, this has been changed.

      Line 479: (Figure 6a in Huff et al., 2022).

      Line 497: What does Fig 7 refer to?

      This should say Figure 1- figure supplement 2, This has been changed

      Lines 501-506: "First, swallow was stimulated by injecting 0.1cc of water into the mouth using a 1.0 cc syringe connected to a polyethylene tube. Second, 25 pulses of each 40ms, 80ms, 120ms, 160ms and 200ms continuous TTL laser stimulation at PiCo was repeated, at random, throughout the respiratory cycle. The lasers were each set to 0.75mW and triggered using Spike2 software (Cambridge Electronic Design, Cambridge, UK). These stimulation protocols were performed in all ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2." .

      Thank you, this has been changed.

      Line 526 and 540: (Fig.6 in Huff et al., 2022) and (Fig.6d in Huff et al., 2022).

      Thank you, this has been fixed

      Line 594: Figure 5 doesn't exist. Please, change the sentence.

      Thank you, this has been fixed

      Line 595 and 609: The reference Kirkcaldie et al. 2012 is referred to the neocortex and doesn't seem appropriate. Please, quote the atlas of Paxinos and Franklin.

      Thank you, this has been changed.

      Reference:

      Please, correct throughout the text editing of references by removing e.g J.M. or A. or David D. and so on. Only surnames should be mentioned.

      Thank you, this has been changed.

      Figures:

      Figure 1. A and B as well as the purple arrow are lacking. In addition, optogenetic stimulation is applied during different periods of inspiratory activity and this could impact the swallow motor pattern. In Bv, Non-LAR seems very similar to LAR. In panel E, please add the number of animals.

      Thank you, this has been fixed.

      We used the same optogenetic protocols in the original paper (Huff et al. 2023) and did not observe any changes to the swallow motor patter in relation to the time PiCo was stimulated. The only phase dependent response seen in both control and CIH is when PiCo Is stimulated during inspiration and a swallow is triggered, inspiration will be inhibited. Therefore, we do not believe variability in swallow motor pattern is dependent on the phase of breathing in which PiCo is stimulated.

      Biv LAR has a pause in EMG activity before the swallow begins (red arrow pointing to the pause). While Bv Non-LAR does not have this pause, rather the two behaviors converge (red arrow). In order for something to be considered an LAR the pause must be present which is why we separated these two motor patterns.

      Figure 1 - Figure Supplement 1. Why do the Authors call the lines "histograms"?

      Thank you, this has been fixed. This is a line graph of swallow frequency in relation to inspiration.

      Tables:

      In tables, data are provided as means and standard deviation. Please, specify this in the Method section.

      Thank you, the following is listed in the methods section: “All data are expressed as mean ± standard deviation (SD), unless otherwise noted.”

      Reviewer #3 (Public Review):

      In the present study, the authors investigated the effects of CIH on the swallowing and breathing responses to PICO stimulation. Their conclusion is that glutamatergic-cholinergic neurons from PICO are not only critical for the gating of post-inspiratory and swallow activity, but also play important roles in the generation of swallow motor patterns. There are several aspects that deserve the authors' attention and comments, mainly related to the study´s conclusions.

      • The authors refer to PICO as the generator of post-inspiratory rhythm. However, evidence points to this region as a modulator of post-inspiratory activity rather than a rhythmogenic site (Toor et al., 2019 - 10.1523/JNEUROSCI.0502-19.2019; Oliveira et al., 2021 - 10.1016/j.neuroscience.2021.09.015). For example, sustained activation of PICO for 10 s barely affected the vagus or laryngeal post-inspiratory activity (Huff et al., 2023 - 10.7554/eLife.86103).

      Yes, we did refer to PiCo as the postinspiratory rhythm generator as defined as Anderson et al. 2016. We base this statement on the following criteria and experiments: In Anderson et al. 2016, we demonstrate that PiCo can be isolated in vitro, that PiCo neurons are activated in phase with postinspiration, and that they are inhibited during inspiration by preBötC neurons via GABAergic mechanisms and not glycinergic mechanisms. We also demonstrate that optogenetically stimulating cholinergic neurons in the PiCo area resets the inspiratory rhythm both in vivo and in vitro. We also show that PiCo when isolated in transverse slices is autorhythmic and that PiCo, like the preBötC in transverse slices can generate respiratory rhythmic activity in vitro and independent of the preBötC. We also demonstrate that PiCo neurons are an order of magnitude more sensitive to opioids (DAMGO) than the preBötC and that local injections of DAMGO into the PiCo area in vivo abolishes postinspiration, and also abolishes the phase delay of the respiratory rhythm. None of these specific rhythmogenic properties have been studied by the Toor study or the Oliveira et al study. Hence, we do not understand why the reviewer cites these studies as evidence for modulation as opposed to rhythmogenic properties. The fact that PiCo is rhythmogenic should not be considered as an “exclusive property”. Specifically, this does not mean that PiCo is also “modulating” the swallow-breathing coordination as we have demonstrated more specifically in the Huff et al study. In the same sentence we also referred to the PreBӧtzinger complex as the inspiratory rhythm generator as defined by Smith et al 1991, and it seems that the reviewer did not object to this reference. But we would like to point out that the same criteria were used to define the preBötzinger complex as we used for PiCo, except that PiCo neurons are better defined than preBötzinger complex neurons. Dbx1 neurons are often used to characterize the PreBötC, but these neurons form a rostrocaudal and ventrodorsal column which involves also glia cells and transcends the preBötC. Glutamatergic neurons are everywhere, and so are Somatostatin or Neurokinin neurons. Moreover, the 1991 study was only performed in vitro, and did not include a histochemical analysis. We would also like to point out that the present manuscript is investigating the role of PiCo in swallow and laryngeal behaviors, and not specifically postinspiration. Thus, we are not entirely sure how this comment relates to this manuscript.

      • The optogenetic activation of glutamatergic and cholinergic neurons from PICO evoked submental and laryngeal responses, and CIH changed these motor responses. Therefore, the authors proposed that PICO is directly involved in swallow pattern generation and that CIH disrupts the connection between PICO and SPG (swallow pattern generator). However, the experiments of the present study did not provide evidence about connections between these two regions nor their possible disruption after CIH, or even whether PICO is part of SPG.

      We have edited the text to suggest PiCo modulates swallow motor sequence in addition to the coordination of swallow and breathing. We have also added that further experiments will be necessary to further investigate the connections between PiCo and SPG. But, unfortunately, compared to PiCo, the SPG is much less defined. As already stated above, it cannot be expected that a single study can address all possible open questions. Clearly, more work needs to be done outside of this study to answer all of these questions, which makes this an exciting area of research.

      • CIH affects several brainstem regions which might contribute to generating abnormal motor responses to PICO stimulation. For example, Bautista et al. (1995 - 10.1152/japplphysiol.01356.2011) documented that intermittent hypoxia induces changes in the activity of laryngeal motoneurons by neural plasticity mechanisms involving serotonin.

      Yes, we thank the reviewer for this comment and we agree that CIH effects multiple brainstem regions. We stated in the manuscript that we are measuring changes in two muscle complexes which spread among three motor neuron pools: hypoglossal nucleus, trigeminal nucleus, and nucleus ambiguus. We have added a discussion on laryngeal activity in the presence of acute bouts of extreme hypoxia, acute intermittent hypoxia, as well as chronic intermittent hypoxia.

      • To support the hypothesis that PICO is directly involved in swallow pattern generation the authors should perform the inhibition of Vglut2-ChAT neurons from PICO and then evoke swallow motor responses. If swallow is abolished when the neurons from this region are inhibited, it would indicate that PICO is crucial to generate this behavior.

      Thank you. We would like to clarify: “involvement” does not mean “necessary for”. Confusing this difference has caused much confusion and debate in the field. Just as an example: We can argue in great length whether inhibition is necessary for respiratory rhythmogenesis in vivo, but I think there is no question that inhibition is involved in respiratory rhythmogenesis in vivo. But to avoid any confusion, we have changed the text to suggest PiCo is involved in the modulation of swallow motor sequence. We agree various additional inhibition experiments are necessary to explain if PiCo is also a necessary component of the SPG, but this is not the question we have set out to address in this study. To specifically target PiCo we must not only inhibit Vglut2 neurons but neurons that express both ChAT and Vglut2. To our knowledge there are no inhibitory DREADD or opsin techniques for cre/FlpO to specifically target these neurons. As stated above, non-experts in the field do not appreciate this technical nuance. However, we have begun to develop novel techniques necessary to inhibit these specific neurons which will be published in the future.

      • In almost all the data presented, the authors observed different patterns of changes in the motor submental and laryngeal responses to PICO activation, including that animals submitted to CIH (6%) presented a "normal" motor response. However, the authors did not discuss the possible explanations and functional implications of this variability.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since we are not using any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not included any functional implications. We have added the following to the manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns.”

      • In Figure 4, the authors need to present low magnification sections showing the PICO transfected neurons as well as the absence of transfection in the ventral respiratory column. The authors could also check the scale since the cAmb seems very small.

      Thank you, added different histology images to have a more comparable cAmb. As well as added lower magnification to show absence of transfection in the VRC.

      • Finally, the title does not reflect the study. The present study did not demonstrate that PICO is a swallow pattern generator.

      We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

    1. Author Response

      eLife assessment

      This valuable study examines the activity and function of dorsomedial striatal neurons in estimating time. The authors examine striatal activity as a function of time and the impact of optogenetic striatal manipulation on the animal's ability to estimate a time interval. However, the task's design and methodology present several confounding factors that mean the evidence in support of the authors' claims is incomplete. With these limitations addressed, the work would be of interest to neuroscientists examining how striatum contributes to behavior.

      We appreciate the editorial process and are grateful for the thorough, detailed, and constructive reviews. We will respond in detail to every point raised by reviewers in a full revision.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition.

      Major strengths:

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used.

      We are glad our main points came through to the reviewer.

      Major weaknesses:

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing?

      Our hypothesis, based on prior behavioral work from our group describing that blocking striatal D1 and D2 dopamine receptors impaired interval timing (De Corte et al., 2019; Stutt et al., 2023) was D1 and D2 MSNs would have similar patterns of activity during interval timing. We will clarify this in the revision.

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances.

      Regarding the results presented in Figures 2 and 3:

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here.

      These are insightful points. We will clarify details of our PCA analysis in the revision. We include PCA for comparisons with our past work (Emmons et al., 2017, 2021; Bruce et al., 2021). Second, it is true that these components can be observed in smoothed data; however, when we generated random data using identical parameters, we found that the variance explained by PC1 was not commonly observed in random data. Third, our goal is to compare between D1 and D2 MSNs, not to interpret the PCs. We will make this explicit in our revision.

      I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.

      This is exactly the analysis shown in Figure 3D. We will clarify this in the revision.

      Relatedly, it seems that the data shown in Figure 2D doesn't support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types.

      This likely refers to Figure 3D. In the revision, we will clarify this analysis, add error bars, and note that our goal was to differentiate D2 and D1 MSNs in this analysis. We will also add to this analysis to better make the poin that D2 and D1 MSNs are distinct, contrary to our hypothesis.

      Regarding the results in Figure 4:

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data.

      This is a great point. Our goal was to fit behavioral activity, not neuronal activity; in our revision, we will do exactly what the reviewer suggests and present data of fits to neuronal activity.

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).

      Our model was inspired by the averages in Figure 2G&H; however, we will fit drift-diffusion models to individual neurons exactly as the reviewer suggests.

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition.

      We will clarify this in our revision, as this is an important point.

      Regarding the results in Figure 6:

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper.

      We agree – we will remove PC2 in Figure 6 and Figure S9 and add context to the PC analysis noting that we are including for 1) comparisons with past work, 2) our observed variance is much higher than observed in random/smoothed data, and 3) we are primarily interested in comparisons between conditions rather than interpreting the components.

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result.

      We agree, although we note D1/D2 blockade changes PC1, which explains the most variance in MSN activity. In the revision, we will show more examples and comment on the robustness of PC1, exactly as the reviewer recommends. The changes in PC1 are rather consistent.

      Also, it seems that if the authors want to claim that this manipulation lowers the drift rate. I think to make this claim, they could fit the DDM model and examine whether D is significantly lower.

      This is a great idea – we will try to do this.

      Regarding the results in Figure 7:

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this?

      We were not clear. The second classifier was predicting response time. This was confusing and we will remove it.

      Impact:

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions.

      Again, we are grateful for the constructive and very insightful comments that we look forward to clarifying in a full revision.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2-expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2-MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis.

      Strengths:

      The authors used multiple approaches including awake mice behavior training, optogenetic-assistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing.

      We appreciate the reviewer’s careful read recognizing the breadth of our approach.

      Weaknesses:

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke?

      We agree. These were presented in detail in our prior work (Bruce et al., 2021; Larson et al., 2022; and Weber et al., 2023) and work from others (Balci et al 2008; Tosun et al., 2016. However, we will work on a detailed behavioral schematic in the revision and move supplementary behavioral data in Figure S1 to the main manuscript.

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text.

      This is a great suggestion – we will do this – and clarify in the above schematic.

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch".

      We analyzed MSN activity on errors in detail Figure 6 of Bruce et al., 2021. These errors are infrequent and inconsistent – we will discuss this in the revision.

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke.

      We agree. The switch time can be vastly different on some trials, making it challenging to compare different lengths and slopes. However, we will clarify the interval as noted above, and we have a few ideas on how to do the analysis the reviewer suggests.

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity.

      We were not clear – we did this analysis exactly the reviewer suggested. We are not pooling any data – instead – as we state on line 620 – we are using linear-mixed effects models to account for mouse-specific and neuron-specific variance. This approach was developed with our statistics core for exactly the reasons the reviewer suggested. Furthermore, we will add to this analysis demonstrative that it is resistant to outliers. Finally, we will include measures of effect size noting that it is a medium to large effect.

      It’s a helpful idea to plot data individually by mice, and we will do so in the revision.

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison, and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity?

      We can certainly include a longer baseline. We can clarify in the revision that mice initiate trials at the rear nosepoke, and this is what initiates the task cues and the temporal interval.

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window.

      This is a great idea, and we have some ideas on how to adapt the GLM analysis to perform this analysis.

      Reviewer #3 (Public Review):

      Summary:

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions.

      Strengths:

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model.

      We are grateful for the reviewer’s consideration of our work and recognizing the strengths of our approach.

      Weaknesses:

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals.

      This is certainly valid, and we will include these points in the revision.

      The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels.

      We are glad that the reviewer raised this. We will add to this analysis demonstrative that it is resistant to outliers. Finally, we will include measures of effect size noting that it is a medium to large effect. Thus, it is significantly above chance, and rather reliable, and supported by our PCA results in Figure 3C.

      In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs.

      Again, this is an important point. We are well aware of heating effects with optogenetics. For the exact reasons noted by the reviewer, we had opsin-negative controls –when the laser was on the exact same time course and parameters – in Figure S5. There were no behavioral effects in controls with identical heating and other effects of the laser. Furthermore, these effects are similar to pharmacological effects in this manuscript and in our prior work (De Corte et al., 2019; Stutt et al., 2023). We will better highlight these issues in the revision.

      Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum.

      This is a great point - we did exactly this experiment in De Corte et al, 2019 with local drug infusions. This earlier study was the departure point for this experiment, although it is challenging to combine focal pharmacological inactivation with recordings in mice (we have extensive experience with this in rats in Parker et al., 2015 and Parker et al, 2015). Furthermore, we have similar local optogenetics effects in this paper. We will include these points in the revised manuscript.

    1. Author Response

      We would like to thank the three reviewers and the eLife editors for their careful analysis of our work, and for their constructive feedback and positive evaluation. We are especially pleased to see echoed in the reviews and in the editorial assessment that our results underline the importance of taking into account glycosylation in viral evolution, immune surveillance, and in the interpretation of complex epistatic interactions. With this provisional response we would like to communicate to the editors, reviewers and to the eLife readership our intention to integrate in the paper a detailed description of the GM1os and GM2os binding site on the RBD with details on the computational approach we used. We agree that this addition will strengthen the work by making it more self-contained. Also, as suggested by the editorial team, we will provide a comprehensive discussion of published data, as a firmer foundation for our findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors develop a useful strategy for fluorophore-tagging endogenous proteins in human induced pluripotent stem cells (iPSCs) using a split mNeonGreen approach. Experimentally, the methods are solid, and the data presented support the author's conclusions. Overall, these methodologies should be useful to a wide audience of cell biologists who want to study protein localization and dynamics at endogenous levels in iPSCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors have applied an asymmetric split mNeonGreen2 (mNG2) system to human iPSCs. Integrating a constitutively expressed long fragment of mNG2 at the AAVS1 locus, allows other proteins to be tagged through the use of available ssODN donors. This removes the need to generate long AAV donors for tagging, thus greatly facilitating high-throughput tagging efforts. The authors then demonstrate the feasibility of the method by successfully tagging 9 markers expressed in iPSC at various, and one expressed upon endoderm differentiation. Several additional differentiation markers were also successfully tagged but not subsequently tested for expression/visibility. As one might expect for high-throughput tagging, a few proteins, while successfully tagged at the genomic level, failed to be visible. Finally, to demonstrate the utility of the tagged cells, the authors isolated clones with genes relevant to cytokinesis tagged, and together with an AI to enhance signal-to-noise ratios, monitored their localization over cell division.

      Strengths:

      Characterization of the mNG2 tagged parental iPSC line was well and carefully done including validation of a single integration, the presence of markers for continued pluripotency, selected offtarget analysis, and G-banding-based structural rearrangement detection.

      The ability to tag proteins with simple ssODNs in iPSC capable of multi-lineage differentiation will undoubtedly be useful for localization tracking and reporter line generation.

      Validation of clone genotypes was carefully performed and highlights the continued need for caution with regard to editing outcomes.

      Weaknesses:

      IF and flow cytometry figures lack quantification and information on replication. How consistent is the brightness and localization of the markers? How representative are the specific images? Stability is mentioned in the text but data on the stability of expression/brightness is not shown.

      To address this comment, we have quantified the mean fluorescence intensity of the tagged cell populations in Fig. S3B-T. This data correlates well with the expected expression levels of each gene relative to the others (Fig. S3A), apart from CDH1 and RACGAP1, which are described in the discussion.

      The images in Fig. 2 show tagged populations enriched by FACS so they are non-clonal and are representative of the diversity of the population of tagged cells.

      The images shown in Fig. 3 are representative of the clonal tagged populations. The stability of the tag was not quantified directly. However, the fluorescence intensity was very stable across cells in clonal populations. Since these populations were recovered from a single cell and grown for several weeks, this low variability across cells in a population suggests that these tags are stable.

      The localization of markers, while consistent with expectations, is not validated by a second technique such as antibody staining, and in many cases not even with Hoechst to show nuclear vs cytoplasmic.

      We find that the localization of each protein is distinct and consistent with previous studies. To address this comment, we have added an overlay of the green fluorescence images with brightfield images to better show the location of the tagged protein relative to the nuclei and cytoplasm. We have also added references to other studies that showed the same localization patterns for these proteins in iPSCs and other relevant cell lines.

      For the multi-germ layer differentiation validation, NCAM is also expressed by ectoderm, so isn't a good solo marker for mesoderm as it was used. Indeed, the kit used for the differentiation suggests Brachyury combined with either NCAM or CXCR4, not NCAM alone.

      Since Brachyury is the most common mesodermal marker, we first tested differentiation using anti-Brachyury antibodies, but they did not work well for flow cytometry. We then switched to anti-NCAM antibodies. Since we used a kit for directed differentiation of iPSCs into the mesodermal lineage, NCAM staining should still report for successful differentiation. In the context of mixed differentiation experiments (embryoid body formation or teratoma assay), NCAM would not differentiate between ectoderm and mesoderm. The parental cells (201B7) have also been edited at the AAVS1 locus in multiple other studies, with no effect on their differentiation potential.

      Only a single female parental line has been generated and characterized. It would have been useful to have several lines and both male and female to allow sex differences to be explored.

      We agree that it would be interesting (and important) to study differences in protein localization between female and male cell types, and from different individuals with different genetic backgrounds. We see our tool as opening a door for cell biology to move away from randomly collected, transformed, differentiated cell types to more directed comparative studies of distinct normal cell types. Since few studies of cell biological processes have been done in normal cells, a first step is to understand how processes compare in an isogenic background, then future studies can reveal how they compare with other individuals and sexes. We hope that either our group or others will continue to build similar lines so that these studies can be done.

      The AI-based signal-to-noise enhancement needs more details and testing. Such models can introduce strong assumptions and thus artefacts into the resolved data. Was the model trained on all markers or were multiple models trained on a single marker each? For example, if trained to enhance a single marker (or co-localized group of markers), it could introduce artefacts where it forces signal localization to those areas even for others. What happens if you feed in images with scrambled pixel locations, does it still say the structures are where the training data says they should be? What about markers with different localization from the training set? If you feed those in, does it force them to the location expected by the training data or does it retain their differential true localization and simply enhance the signal?

      The image restoration neural network was used as in Weigert et al., 2018. The model was trained independently for each marker. Each trained model was used only on the corresponding marker and with the same imaging conditions as the training images. From visual inspection, the fluorescent signal in the restored images was consistent with the signal in the raw images, both for interphase and mitotic cells. We found very few artefacts of the restoration (small bright or dark areas) that were discarded. We did not try to restore scrambled images or images of mismatched markers.

      Reviewer #2 (Public Review):

      Summary:

      The authors have generated human iPSC cells constitutively expressing the mNG21-10 and tested them by endogenous tagging multiple genes with mNG211 (several tagged iPS cell lines clones were isolated). With this tool, they have explored several weakly expressed cytokinesis genes and gained insights into how cytokinesis occurs.

      Strengths:

      Human iPSC cells are used.

      Weaknesses:

      i) The manuscript is extremely incremental, no improvements are present in the split-fluorescent (split-FP) protein variant used nor in the approach for endogenous tagging with split-FPs (both of them are already very well established and used in literature as well as in different cell types).

      Although split fluorescent proteins and the endogenous tagging methodology had been developed previously, their use in human stem cells has not been explored. We argue that human iPSCs are a valuable model for cell biologists to study cellular processes in differentiating cells in an isogenic context for proper comparison. Many normal human cell types have not been studied at the cellular/subcellular level, and this tool will enable those studies. Importantly, other existing cell lines required transformation to persist in culture and represent a single, differentiated cell type that is not normal. Moreover, the protocols that we developed along with this methodology (e.g. workflows for iPSC clonal isolation that include automated colony screening and Nanopore sequencing) will be useful to other groups undertaking gene editing in human cells. Therefore, we argue that our work opens new doors for future cell biology studies.

      ii) The fluorescence intensity of the split mNeonGreen appears rather low, for example in Figure 2C the H2BC11, ANLN, SOX2, and TUBB3 signals are very noisy (differences between the structures observed are almost absent). For low-expression targets, this is an important limitation. This is also stated by the authors but image restoration could not be the best solution since a lot of biologically relevant information will be lost anyway.

      The split mNeonGreen tag is one of the brighter fluorescent proteins that is available. The low expression that the reviewer refers to for H2BC11, ANLN, TUBB3 and SOX2 is expected based on their predicted expression levels. Further, these images were taken with cells in dishes using lower resolution imaging and were not intended to be used for quantification. As shown in the images in Figures 3H, when using a different microscope with different optical settings and higher magnification, the localization is very clear and quantifiable without needing to use restoration (e.g., compare H2BC11 and ANLN). Using microscopes with high NA objectives, lasers and EMCCD or sCMOS cameras with high sensitivity can sufficiently detect levels of very weakly expressing proteins that can be quantified above background and compared across cells. It is worth noting that each tag may be studied in very different contexts. For example, ANLN will be useful for studies of cytokinesis, while the loss of SOX2 expression and gain of TUBB3 expression may be used to screen for differentiation rather than for localization per se. The reason for endogenous tagging is to study proteins at their native levels rather than using over-expression or fixation with antibodies where artefacts can be introduced. Endogenous tags tag will also enable studies of dynamic changes in localization during differentiation in an isogenic background as described previously.

      Importantly, image restoration is not required to image any of these probes! We use it to demonstrate how a researcher can increase the temporal resolution of imaging weakly-expressed proteins for extended periods of time. This data can be used to compare patterns of localization and reveal how patterns change with time and during differentiation. Imaging with fewer timepoints and altered optical settings will still permit researchers to extract quantifiable information from the raw data without requiring image restoration.

      iii) There is no comparison with other existing split-FP variants, methods, or imaging and it is unclear what the advantages of the system are.

      We are not sure what the reviewer means by this comment. In the future, we plan to incorporate an additional split-FP variant (e.g., split sfCherry) in this iPSC line to enable the imaging of more than one protein in the same cell. However, the split mNeonGreen system is still amenable for use with dyes with different fluorescence spectra that can mark other cellular components, especially for imaging over shorter timespans. In addition to tagging efficiency, the main advantage of split FPs is its scale, as demonstrated by the OpenCell project by tagging 1,310 proteins endogenously (Cho et al., 2022). We developed protocols that facilitate the identification of edited cell lines with high throughput. We also used multiple imaging methods throughout the study that relied on the use of different microscopes and flow cytometry, demonstrating the flexibility of this tagging system. Even for more weakly expressing proteins, the probe could be sufficiently visualized by multiple systems. Such endogenous tags can be used for everything from simply knowing when cells have differentiated (e.g., loss of SOX2 expression, gain of differentiation markers), to studying biological processes over a range of timescales.

      Reviewer #3 (Public Review):

      The authors report on the engineering of an induced Pluripotent Stem Cell (iPSC) line that harbours a single copy of a split mNeonGreen, mNG2(1-10). This cell line is subsequently used to take endogenous protein with a smaller part of mNeonGreen, mNG2(11), enabling the complementation of mNG into a fluorescent protein that is then used to visualize the protein. The parental cell is validated and used to construct several iPSC lines with endogenously tagged proteins. These are used to visualize and quantify endogenous protein localisation during mitosis.

      I see the advantage of tagging endogenous loci with small fragments, but the complementation strategy has disadvantages that deserve some attention. One potential issue is the level of the mNG2(1-10). Is it clear that the current level is saturating? Based on the data in Figure S3, the expression levels and fluorescence intensity levels show a similar dose-dependency which is reassuring, but not definitive proof that all the mNG2(11)-tagged protein is detected.

      We have not quantified the levels of mNG21-10 expression directly. However, the increase in fluorescence observed with highly expressed proteins (e.g., ACTB) supports that mNG21-10 levels must be sufficiently high to permit differences among endogenous proteins with vastly different expression levels. To ensure high expression, we used a previously validated expression system comprised of the CAG promoter integrated at the AAVS1 locus, which has previously been used to provide high and stable transgene expression (e.g. Oceguera-Yanez et al., 2016). We acknowledge that it is difficult to confirm that all of the endogenous mNG211-tagged protein is ‘detectable’.

      Do the authors see a difference in fluorescence intensity for homo- and heterozygous cell lines that have the same protein tagged with mNG2(11)? One would expect two-fold differences, or not?

      To answer this question, we measured the fluorescence intensity of homozygous and heterozygous clones carrying smNG2-anillin and smNG2-RhoA. We found homozygous clones that were approximately twice as bright as the corresponding heterozygous clones (Fig. S4H and I). This suggests that the complementation between mNG21-10 and mNG211 occurs efficiently over a range of mNG211 expression, since anillin is expressed weakly and RhoA is expressed more strongly in iPSCs. However, we also observed some homozygous clones that were not brighter than the corresponding heterozygous clones, which could be due to undetected byproducts of CRISPR or clonal variation in protein expression.

      Related to this, would it be favourable to have a homozygous line for expressing mNG2(1-10)?

      Our heterozygous cell line leaves the other AAVS1 allele available for integrations of other transgenes for future experiments. While a homozygous line could express more mNG2(1-10), it does not seem to be rate-limiting even with a highly-expressed protein like beta-actin, and we are not sure that it is necessary. The value gained by having the free allele could outweigh the difference in mNG2(1-10) levels.

      The complementation seems to work well for the proteins that are tested. Would this also work for secreted (or other organelle-resident) proteins, for which the mNG2(11) tag is localised in a membrane-enclosed compartment?

      The interaction between the 1-10 and 11 fragments is strong and should be retained when proteins are secreted. It was recently shown that secreted proteins tagged with GFP11 can be detected when interacting with GFP1-10 in the extracellular space, albeit using over-expression (Minegishi et al., 2023). However, in our work, the mNG21-10 fragment is cytosolic and we have only explored proteins localized to the nucleus or the cytoplasm similar to Cho et al., (2022). By GO annotation, 75% of human proteins are present in the cytoplasm and/or nucleus, which still covers a wide range of proteins of interest. Future versions of our line could include incorporating organelle-targeting peptides to drive the large fragment to specific, non-cytosolic locations.

      The authors present a technological advance and it would be great if others could benefit from this as well by having access to the cell lines.

      As discussed below, some of the resources are already available, and we are working to make the mNG21-10 cell line available for distribution.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is methodological, the main achievement is the generation of a stable iPSC with the split Neon system available for the scientific community. Although it is technically solid, the judgement of this reviewer is that the manuscript should be considered for a more specialised/methodological/resource-based journal.

      Indeed, we have submitted this article under the “tools and resources” category of eLife, which publishes methodology-centered papers of high technical quality. We felt this was a good venue for the audience that it can reach compared to more specialized journals that may be more limited in scope. For example, our system will be a useful resource for cell biologists and they are more likely to see it in eLife compared to more specialized journals.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors present a technological advance and it would be great if others can benefit from this as well. Therefore access to the materials (and data) would be valuable (the authors do a great job by listing all the repair templates and primers).

      We have added several pieces of data and information to the supplementary materials, as described below.

      For instance:

      What is the (complete/plasmid) sequence of the AAVS1-mNG2(1-10) repair plasmid? Will it be deposited at Addgene?

      The plasmids used in this paper are now available on Addgene, along with their sequences [ID 206042 for pAAVS1-Puro-CAG-mNG2(1-10) and 206043 for pH2B-mNG2(11)].

      The ImageJ code for the detection of colonies is interesting and potentially valuable. Will the code be shared (e.g. at Github, or as supplemental text)?

      The ImageJ macro has been uploaded to the CMCI Github page (https://github.com/CMCI/colony_screening). The parameters are optimized to perform segmentation on images obtained using a Cytation5 microscope with our specific settings, but they can be tweaked for any other sets of images. The following text has been added to the methods section: “The code for this macro is available on Github (https://github.com/CMCI/colony_screening)”.

      The cell line with the mNG2(1-10) as well as other cell lines can be of interest to others. Will the cell lines be made available? If so, can the authors indicate how?

      We are in the process of depositing our cell line in a public repository. This process may take some time for quality control. For now, the cells can be made available by requesting them from the corresponding authors.

      (2) How well does the ImageJ macro for detection of the colonies in the well work? Is there any comparison of analysis by a human vs. the macro?

      In our most recent experiment, the colony screening macro correctly identified 99.5% of wells compared to manual annotation (83/84 positive wells and 108/108 negative wells). For each 96-well plate, imaging takes 25 minutes, and it takes 7 minutes for analysis. Despite a few false negatives, we expect this macro to be useful for large-scale experiments where multiple 96-well plates need to be screened, which would take hours manually.

      (3) The CDH labeling was not readily detected by FACS, but was visible by microscopy. Is the labeling potentially disturbed by the procedure (low extracellular calcium + trypsin?) to prepare the cell for FACS?

      It is not clear why the CDH labelling was not detected by FACS. As the reviewer suggests, there could be several reasons: E-cadherin could be broken down by the dissociation reagent (Accutase), or recycled into the cell following the loss of adhesion and the low extracellular calcium in PBS. However, the C-terminal intracellular tail of E-cadherin was tagged, which should not be affected by Accutase. Moreover, recycling into the cell should still result in a detectable fluorescent signal. Notably, the flow cytometry experiments were done as quickly as possible after dissociation to minimize the time that E-cadherin could be degraded or recycled. We also resuspended the cells in MTeSR Plus media instead of PBS, and compared cells grown on iMatrix511 to those grown on Matrigel in case differences in the extracellular matrix affected Ecadherin expression. Another possibility is that the microscopy used for detection of E-cadherin in cells involved using a sweptfield livescan confocal microscope with high NA objective, 100mW 488nm laser and an EMCCD camera with high sensitivity, and perhaps this combination permitted detection better than the detector on the BD FACSMelody used for FACs.

      (4) The authors write that the "Tubulin was cytosolic during interphase" which is surprising (and see also figure 3H), as I was expecting it to be incorporated in microtubules. May this be an issue of insufficient resolution (if I'm right this was imaged with 20x, NA=0.35 and so the resolution could be improved by imaging at higher NA)?

      Indeed, as the reviewer points out, our terminology (cytosol vs. microtubule) reflects the low resolution of the imaging for the cell populations in dishes and the individual alpha-tubulin monomers being labelled with the mNG211 tag, which are present as cytoplasmic monomers as well as polymers on microtubules. However, even in this image (Fig. 2C), the mitotic spindle microtubules are visible as they are so robust compared to the interphase microtubules. Notably, when we imaged cells from the cloned tagged cell line using a microscope designed for live imaging with a higher NA objective (see above), endogenous tagged TUBA1B was even more clearly visible in spindle microtubules, and was weakly observed in some microtubules in interphase cells, although they are slightly out of focus (Fig. 3H). If we had focused on a lower focal plane where the interphase cells are located and altered the optical settings, we would see more microtubules.

      (5) It would be nice to have access to the Timelapse data as supplemental movies (.e.g from the experiments shown in Figure 4).

      We have added the movies corresponding to the timeplase images as supplementary movies (Movies S1-6), with the raw and restored movies shown side-by-side.

      (6) In Figure 3B, the order of the colors in the bar is reversed relative to the order of the legend. Would it be possible to use the same order? That makes it easier for me (as a colorblind person) to match the colors in the figure with that of the legend.

      We have modified the legend in Fig 2B and 3B to be in the same order as the bars.

    1. Author Response

      We are deeply grateful for the highly professional analysis of our work by the Journal Editor and Reviewers. Here is our provisional response to some of the reviewer comments. In our response, we would like to address two comments that were common to all Reviewers' responses. We will thoroughly address all of the Reviewers' comments in the final version of the paper.

      Incomplete analysis of maturational changes of striato-nigral connections.

      In the initial study, we showed that chronic inhibition of striosomal neurons with the DREADD approach during early postnatal development leads to decreased functional innervation of dopaminergic cells by striosomes in adulthood. We have shown that by two approaches: (1) analysis of miniature inhibitory post-synaptic currents (mIPSCs) and (2) analysis of GFP and gephyrin puncta densities around dopaminergic cells. The results from these experiments strongly suggest a decrease in inhibitory drive to dopaminergic neurons of substantia nigra pars compacta, yet we agree that only GFP puncta density can be considered as a direct evidence for weakened striatonigral connections. Reviewers indicated that additional direct measurements of striatonigral synaptic efficacy would be needed to strengthen our conclusions. We completely agree with this statement and will evaluate the possibility of doing the suggested experiments, using optogenetic stimulation of striosomal inputs to dopaminergic neurons.

      Inconsistent description of Ca2+ imaging experiments.

      Unfortunately, there was a general misunderstanding in interpreting the Ca2+ imaging methods description. All our experiments were done so that baseline Ca2+ oscillations and oscillations in the presence of a drug were recorded in the usual ACSF (containing 3 mM KCl) at the patch-clamp setup chamber. So, conditions were exactly the same as for cell-attached and whole-cell recordings. At the end of each experiment, ACSF containing 8 mM KCl was applied. This high-KCl condition was used to calculate the total number of viable cells reacting to elevated potassium concentrations, and this number was taken as 100 %. Therefore, the percents displayed in the paper represent the actively oscillating cells in common ACSF (3 mM KCl), counted as a percent of the total number of cells that responded to the following high potassium stimulation (8 mM KCl). The formula was: (Number of active cells in 3 mM KCl / number of viable cells active at 8 mM KCl)*100.

    1. Author Response

      We appreciate your constructive feedback on our manuscript entitled “Deletion of sulfate transporter SUL1 extends yeast replicative lifespan via reduced PKA signaling instead of decreased sulfate uptake” (ID: eLife-RP-RA-2023-94609). Your comments/suggestions are very helpful for improving our manuscript. In particular, we feel additional experiments and analysis suggested by the reviewers will help strengthen our argument that Sul1 deletion mutant extends lifespan via decreased PKA signaling, instead of via decreased sulfate uptake. Below we outline our response to the reviewer's comments/suggestions and the plans for additional experiments and analysis.

      (1) Our current model is that lifespan extension following SUL1 knockout depends on the PKA signaling pathway but not sulfate transport. To further substantiate this, we plan to conduct further transcriptome sequencing and dynamic sulfate uptake experiments using WT, Sul1D and Sul1E427Q strains. If our model is correct, we expect that PKA signaling pathway will be more repressed in Sul1D strain than in Sul1E427Q strain, but the sulfate transport will be similar in both strains. This will add strong evidences supporting the model in addition to the lifespan data.

      (2) The reviewer mentioned the disparities observed between the lifespan of WT in Figure 1B and other experimental assays. Although it is known that lifespan for WT varies considerably from experiment to experiment (thus the need for WT control for every lifespan measurement), we agree it is important to make a solid conclusion that Sul1E427Q does not extend lifespan. We plan to measure the lifespan of more cells for the mutant strains illustrated in Figure 1B and update the data and charts.

      (3) Other issues, for example, the small images of Msn2/4 in the nucleus, grammar and formatting errors, and the lifespan data of double (Sul1/Msn4) mutants will be addressed in the revised version of the manuscript after we performed the additional experiments/analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      (1) It is unclear whether the authors took into consideration the contribution of nuclear blebs for nuclear volume measurements. This would be particularly relevant in situations of very strong confinement. Blebs were previously shown to affect volume (Mistriotis et al., JCB 2019). One could argue that the decreased nuclear volume was due to the increased blebbing observed in very strong confinements.

      As stated in the main text: “[Nuclear Blebs] had a limited contribution to the increase in nuclearprojected area, as the increase remained significantly different even if protrusions were dismissed to compute the projected area (Fig S3C)”. In addition, a decrease in the nuclear volume was also observed for slight and intermediate confinement (height = 7 and 9 µm), while in these two conditions, no blebs are observed.

      (2) From their experimental setup, it is unclear whether the reduced nuclear volume observed after confined cell division arises from a geometrical constraint or is due to an intrinsic nuclear feature. One could argue that cells exiting mitosis under confinement have clustered chromosomes and, therefore, will have decreased volume. This would imply that the nucleus is not "reset" but rather that a geometrical constraint is forcing nuclei to be smaller. One way to test this would be to follow individual cells under confinement, let them enter mitosis, and then release the confinement. If, under these conditions, the daughter nuclei are smaller, then it supports their model. If daughter nuclei recover to their initial value, then it´s simply due to a geometrical constraint that forces the clustering of chromosomes and the reassembly of the NE in a confined space.

      We agree with the reviewer. As stated in the discussion, “For now, the mechanisms involved remain elusive”, and “Our results call for an in-depth analysis of the molecular pathways at play”. The experiments suggested by the reviewer are definitely important experiments that we plan to carry out. Indeed, it is important to know if cells that were ‘born’ under confinement will retain smaller nuclei in the next generation if confinement is released, or whether the next generation will recover their initial larger nuclei.

      (3) The authors claim that the nucleus adapts to confinement based on evidence that the nucleus no longer shrinks in the second division following the first division. I would argue no further decrease is possible because the DNA is already compacted in the smallest possible volume. If indeed nuclei are in a new homeostatic state as the authors claim, then one would expect nuclei to remain smaller even after confinement is removed. This analysis is missing.

      As mentioned above, we agree that “deconfinement experiments” are indeed important. Nevertheless, we respectfully want to point out that the DNA is not compacted to its maximum level during confinement.

      First, we observed that the nuclei of the second generation of cells born in confinement no longer shrink for all investigated confinement conditions, including for slight confinement (height of 9 µm, corresponding to an initial nuclear deformation of 41%), where DNA is less confined compared to the very strong confinement condition (height of 3 µm, corresponding to an initial nuclear deformation of 70%).

      Second, the total uncompressible volumetric fraction of a cell is smaller than 30% (Roffay et al. PMID: 34785592, Cell Biology by the Numbers ISBN: 9780815345374) this allows a nucleus to be compressed to over 70% of its size, as we observed in the extreme scenario.

      (4) Also, if the authors want to claim that this is a mechanism used for cancer cells to adapt to confined situations as the title says, they need to show that normal, near-diploid cells do not behave in the same way. This analysis is missing.

      We agree with the reviewer. For the revised version, we have planned to analyze cell response to confinement using the RPE-1 cell line, as a model of a diploid and untransformed cell line. This will be important experiments to know if the nuclear mechanism identified in the HT-29 cell line is also at stake for normal cells.

      (5) Authors state that "Loss of nuclear blebs is clearly linked to mitosis, suggesting that nuclear volume and nuclear envelope tension are tightly coupled, and supports the hypothesis that mitosis is a key regulator of nuclear envelope tension". I have a few issues with the way this sentence is written. Firstly, one could say that all nuclear structures (and not only blebs) are lost during mitosis because the nucleus disassembles. Hence, the new homeostatic state could be determined by envelope reassembly after mitosis and not mitosis itself. Thirdly, how can mitosis be a key regulator of nuclear envelope tension when the nucleus is disassembled during the process? These require clarification.

      We agree with the reviewer that the formulation used required clarification that will be made in the revised version: for now, we only have evidence that nuclear volume regulation is at stake at mitosis. The most probable hypothesis is that confinement perturbed NE reassembly after mitosis, and that this perturbed reassembly leads to a change in nuclear volume. Complementary experiments are needed to test such a hypothesis, using cell lines stably expressing LAP2/LAP2b-GFP for instance. It is however delicate experiments that will require a dedicated study on its own.

      Secondly, I don´t understand why the loss of nuclear blebs suggests that volume and tension are tightly coupled.

      Nuclear Blebs appear once nuclei have reached a critical NE tension (Srivastava, et al PMID: 33662810). The fact that cells “born” under confinement have no nuclear blebs means that their nuclei are no longer under tension. This is a direct consequence of the decrease in nuclear volume, implying a coupling between volume and tension.

      (6) The authors claim that, unlike previous studies (Lomakin et al), this work shows a "gradual nuclear adaptation". From their results, this is difficult to conclude simply because they do not analyse cPLA2 levels. This is solely based on indirect evidence obtained from cPLA2 inhibition. A gradual adaptation would mean that based on the level of confinement we would expect to have increasingly higher levels of cPLA2 (and therefore nuclear tension).

      We thank the reviewer for his/her comment. Indeed, we have no direct evidence of gradual cPLA2 recruitment in our study, as we did not analyze cPLA2 levels.

      However, of note, in our study, nuclear volume and tension adaptation occur in the entire range of confinement height (from 3 to 9 µm), with a decrease in nuclear volume inversely correlated with the imposed initial nuclear deformation (fig S2C). On the contrary, in Lomakin et al., for HeLa cells, a threshold of 5 µm confinement is needed to trigger a cell motility response mediated by cPLA2. Such a difference suggests that other parameters are used as a confinement readout by cells during the reassembly of the NE after mitosis.

      (7) The authors should refrain from saying that the mechanism behind DNA repair is coupled to the nuclear adaptation they show. There are several points regarding this statement. Firstly, increased DNA damage could be due to nuclear ruptures imposed by confinement at 2h. In fact, the authors show leakage of NLS from the nucleus after confinement (Figure S3A). Secondly, the decrease in DNA damage at 24h could be because these nuclei did not rupture. How can they ensure that cells with low DNA damage at 24h had increased DNA damage at 2h? Finally, one needs to confirm if the nuclei they are analysing at 24h did undergo a round of cell division previously. From the evidence provided, the authors cannot conclude that DNA damage regulation is occurring in confined cells. Moreover, cell cycle arrest is a known effect of DNA damage. Cells with high damage at 2h most likely are arrested or will present with increased mitotic errors (which the authors exclude from their analyses).

      We need to clarify our analysis workflow: it was only in live experiments that we excluded cells with abnormal cell division, as cell division was visible in the timelapse. For immuno-staining analysis on fixed samples, all non-apoptotic cells were taken into account in the analysis. The decrease in DNA damage observed at 24h thus applies to all cells under confinement. There is a clear difference between 2h and 24h in the 2AX immunostaining (that is used as a proxy for DNA damage): whereas at 2h almost all cells have several foci (10-15 foci per cells on average fig. 3H), the number of foci in the entire cell population decreases to 1-2 foci per cell at 24h. The population at 24h mainly includes cells that have undergone a round of cell division, with >80 % of normal cells, as quantified in Fig. 3 E. In the revised version, we will include as a supplementary figure, a quantification of the percentage of cells having more than 5 foci at 2h and 24h, as well as large field of views for -2AX immunostaining to illustrate the distribution.

      Reviewer #2 (Public Review)

      One major limitation is that all experiments are performed in a single cell line, HT-29 human colorectal cancer cells, which has an unusual nuclear envelope composition as it has no lamin B2, low lamin B1 levels, and contains a p53 mutation. Because lamins B1 and B2 play important functions in protecting the nuclear envelope from blebs and confinement-induced rupture, and p53 is crucial in the cellular DNA damage response, it remains unclear whether other cell lines exhibit similar adaptation behavior.

      We agree that including other cell lines would help generalize our findings. It would be interesting in the future to analyze if a similar regulation exists for other cell types. In particular, as stated in the discussion, it would be very interesting to investigate whether this nuclear adaptation is universal, or if it is a consequence of a dysregulation in a specific cancer pathway. Our current manuscript is relevant as it uncovers the existence of this highly interesting phenomenon.

      Investigating if other cell types have the same capacity to adapt would provide insights into the molecular mechanisms involved. In the revised version, we specifically plan to analyze nuclear response under prolonged confinement in 2 types of cells :(1) normal cells with near diploid characteristics (RPE-1 cell line, as a model of a diploid and untransformed cell line); (2) other colorectal cancer cell lines presenting higher levels of lamin B2 and B1, and no P53 mutation (HCT-116).

      Furthermore, although the time-lapse experiments suggest that reduction in nuclear volume occurs primarily during mitosis, the authors do not address whether prolonged confinement, even in the absence of apoptosis, could also result in cells adjusting their nuclear volume, or alternatively normalizing nuclear envelope tension by recruiting additional membrane from the endoplasmic reticulum, which is continuous with the nuclear membranes.

      Even if we cannot completely ruin the hypothesis raised by the reviewer, we respectfully want to stress that if additional membrane from the endoplasmic reticulum were recruited, we should observe an increase in nuclear volume at S/G2, which is the case only for the strongest imposed confinment (h=3 µm, corresponding to an initial nuclear deformation of 70 % Figure S2E). It should be however very interesting in the future to directly assess nuclear envelope tension and to follow with high resolution live experiments the eventual recruitment of additional membrane.

      Regarding the proposed role of cPLA2, previous studies have shown that cPLA2 recruitment to the nuclear membrane, which is essential to mediate its nuclear mechanotransduction function, requires both an increase in nuclear membrane tension and intracellular calcium. However, the current study does not include any data showing the recruitment of cPLA2 to the nuclear membrane upon confinement, or the disappearance of nuclear membrane-associated cPLA2 during prolonged confinement, leaving unclear the precise function and dynamics of cPLA2 in the process.

      We agree with the reviewer that it would be very informative to analyze the recruitment of cPLA2 in live experiments. We plan to do this in future experiments using cPLA2 immunostaining at different time points or the cPLA2-mKate construct. This will be the subject of a dedicated study, together with possible changes in nuclear pores size and organization, as well as nuclear tension analysis. For this article, we plan to add the analysis of the effect of cPLA2 inhibition in live experiments.

      Lastly, it remains unclear (1) whether the reduction in nuclear volume is caused by a reduction in nuclear water content, by chromatin compaction, e.g. associated with an increase in heterochromatin, or through other mechanisms, (2) whether the change in nuclear volume is reversible, and if so, how quickly,

      We thank the reviewer for his/her comment. This point was also mentioned by Reviewer #1. It is important to know if cells that were ‘born’ under confinement will retain smaller nuclei in the next generation if confinement is released, or whether the next generation will recover their initial larger nuclei. We plan to perform such “deconfinement” experiments and add the results in the revised version. In addition, we also plan to investigate in more detail the DNA compaction state during confinement.

      and (3) what functional consequences the substantial reduction in nuclear volume has on nuclear function, as one would expect that this reduction would be associated with a substantial increase in nuclear crowding, affecting numerous nuclear processes.

      We agree with the reviewer that such a reduction in nuclear volume would most probably affect numerous nuclear processes that would be highly interesting to decipher in the future. Especially, as pointed out in the discussion, “the regulation of nuclear size identified in this study could have important consequences on resistance to classical chemotherapeutic treatments that target proliferation”. This question merits an entire study and is outside the scope of our current manuscript.

      Reviewer #3 (Public Review)

      (1) One essential consideration that goes unaddressed is whether the nuclear volume alone is changing under compression (resulting in a higher nuclear to cytoplasmic ratio) or if the cell volume is changing and the nuclear volume is following suit (no change in the N:C ratio). Depending on which of these is the case, the overall model would likely shift. In particular, interpreting the effect of disrupting myosin II activity given its different distribution at the cortex in response to the higher confinement would be influenced by which of these conditions are at play.

      We agree with the reviewer. As stated in the discussion, “the nuclear to cytoplasmic volume ratio, which is constant within a given population, is most likely to be impacted by confinement and changes in nuclear envelope tension (24, 45, 46), and might be at play in the regulation we describe herein”.

      As mentioned in the results section, “the distance between the cell membrane and the nuclear envelope was significantly reduced with confinement (Fig. 1D, Fig. S1B) and accompanied by the relocalization of the contractility machinery (Phosphorylated Myosin Light Chain (p-MLC) staining) from above the nucleus to the side, indicating a cortex rearrangement (Fig. S1C)”. For the revised version, we plan to investigate if such relocalization is accompanied by a change in the nuclear to cytoplasmic ratio using the p-MLC and nuclei immunostaining performed at 2h and 24h under the entire range of confinement investigated.

      (2) -A key approach used and interpreted by the investigators is an assessment of the folding of the "inner lamin envelope", which they derive from an image analysis routine of lamin staining that they developed and argue reflects "nuclear envelope tension". I am not convinced of the robustness of this approach or what it mechanistically reveals. It may or may not reflect the contour of the inner nuclear membrane, which (perhaps) is the most relevant to the authors' interpretation of nuclear envelope tension. Given the major contribution of this data to the model, which is based on the "unfolding" of the nuclear envelope, an orthogonal approach (e.g. electron microscopy - which one needs to truly address the high-frequency undulations of the nuclear envelope) is needed to support the larger conclusions.

      We agree with the reviewer that the precise measurement of NE surface area is challenging because of the NE folds, and that our approach is provides semi-quantitative information. Higher-resolution approaches would be necessary to investigate that point in more details, using 3D super-resolution. However, we want to point out that even with our limited resolution, the differences observed in lamin A/C staining are striking (Fig. 3A): while lamin folds are completely absent at 2h under strong confinement, inner lamin folds are massively observed at 24h, showing a pattern very similar to the control condition. In the revised version, we will add more representative images to strengthen that our analysis is representative of our observations.

      (3) The authors argue that nuclear tension is lost after mitosis in the confined devices because nuclear volume has decreased. While a smaller nuclear volume might indeed translate to less compressive force from the device on the nucleus, one would imagine that the chromosomes still have to be accommodated and that confining them in a smaller volume could increase the tension. Although arguable, the potential alternative possibilities suggest that actual measurements of nuclear envelope tension are needed to robustly test the model. The authors cite the observation that blebs are less prevalent after mitosis as additional support for this model, but this is expected as nuclear envelope breakdown and reformation will "reset" the nuclear contour while the appearance of blebs at mitotic entry is essential a "memory" of all blebs and ruptures over the entire preceding cell cycle.

      We agree with the reviewer that assessing the nuclear envelope tension would enable a better description of the underlying process. It will be the subject of a dedicated study, together with possible changes in nuclear pore size and organization, as well as the analysis of cPLA2 recruitment.

      The proposed model in the current study is for the moment simply a geometrical model. Given the simplicity of the model, the fit with our experimental points is striking.

      (4) Representative images for the pharmacological perturbations other than blebbistatin are notably absent - only the analyzed data are presented in the manuscript or the supplemental material. How these perturbations (e.g. to cPLA2) also affect the cortex is important to interpret the data given the point raised above. Orthogonal approaches would also strengthen the conclusions (for example, the statement that "nuclear adaptation observed during mitosis requires nuclear tension sensing through cPLA2" requires more evidence to be convincing - it is not sufficiently supported by the data presented). Even if this is the case, the authors acknowledge that cPLA2 is likely not the answer to the adaption observed under the lower degrees of confinement. Thus, the mechanisms underlying the adaptive changes to nuclear volume remain enigmatic.

      We thank the reviewer for this insightful comment, and we plan to add representative images for the pharmacological perturbation in the revised version of the manuscript.

      (5) One more consideration that seems to go without comment is that the cells under confinement do not appear to successfully complete cytokinesis (Fig. 5b). At a minimum this seems like a major perturbation to cell physiology and needs to be more fully discussed by the authors as playing a role in the observed changes in nuclear volume.

      We agree that in the image chosen for Fig. 5b, cytokinesis does not seem to be complete. This is not representative of the entire cell population as 80% of the cell population showed a normal phenotype under very strong confinement with no drug (Fig. 5C and 3E, as well as fig S3D for a representative large field of view). Live experiments using the FUCCI cell lines also show that cells are capable of making several complete divisions under confinement (Fig. 2). Complementary experiments under pharmacological treatments and confinement are planned to extend our analysis of such processes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a valuable finding on the possible use of vilazodone in the management of thrombocytopenia through regulating 5-HT1A receptor signaling. The evidence supporting the claims of the authors is solid, with the combined use of computational methods and biochemical assays. The work will be of broad interest to scientists working in the field of thrombocytopenia.

      Public Review:

      Reviewer #1 (Public Review):

      Summary:

      This is well-performed research with solid results and thorough controls. The authors did a good job of finding the relationship between the 5-HT1A receptor and megakaryocytopoiesis, which demonstrated the potential of vilazodone in the management of thrombocytopenia. The paper emphasizes the regulatory mechanism of 5-HT1A receptor signaling on hematopoietic lineages, which could further advance the field of thrombocytopenia for therapeutic purposes.

      Strengths:

      This is comprehensive and detailed research using multiple methods and model systems to determine the pharmacological effects and molecular mechanisms of vilazodone. The authors conducted in vitro experiments using HEL and Meg-01 cells and in vivo experiments using Zebrafish and Kunming-irradiated mice. The experiments and bioinformatics analysis have been performed with a high degree of technical proficiency. The authors demonstrated how vilazodone binds to 5-HTR1A and regulates the SRC/MAPK pathway, which is inhibited by particular 5-HTR1A inhibitors. The authors determined this to be the mechanistic underpinning for the effects of vilazodone in promoting megakaryocyte differentiation and thrombopoiesis.

      Weaknesses:

      (1) Which database are the drug test sets and training sets for the creation of drug screening models obtained from? What criteria are used to grade the results?

      Response: Thank you for your thoughtful comment. The database is built by our laboratory. Firstly, we collected 39 small molecule compounds that can promote MK differentiation or platelet formation and 691 small molecule compounds that have no obvious effect on MK differentiation or platelet formation to buiid the datbase. Then, the data of the remaining 713 types of small molecule compounds were utilized as the Training set, and the Molecular Descriptors of 2 types of active and 15 types of inactive small molecule compounds were randomly picked as the Validation set. With regard to the activity evaluation criteria, the prediction score for each molecule was between 0 and 1, and the model decision was made with a threshold of 0.5. The molecule with a score above the 0.5 threshold was identified as a megakaryopoiesis inducer (1).

      Reference:

      (1) Mo Q, Zhang T, Wu J, et al. Identification of thrombopoiesis inducer based on a hybrid deep neural network model. Thromb Res. 2023;226:36-50. doi:10.1016/j.thromres.2023.04.011

      (2) What is the base of each group in Figure 3b for the survival screening of zebrafish? The positivity rate of GFP-labeled platelets is too low, as indicated by the quantity of eGFP+ cells. What gating technique was used in Figure 3e?

      Response: We are deeply grateful for the insightful feedback you have provided regarding Figure 3 and the assessment of zebrafish model. We used 50 zebrafish embryos per group to evaluate VLZ toxicity, and we think this is a suitable and fair baseline. Our gating procedure is clearly depicted in the resulting diagram. Since our goal was to evaluate the fluorescence intensity quantitatively, we isolated the entire zebrafish cell. Since the amount of eGFP+ in various zebrafish tissues found in other literature is likewise quite low and we are unsure of the typical eGFP+ threshold for zebrafish (1, 2), we think this finding should be fair given that each group's activities in the experiment were conducted in parallel.

      Reference:

      (1) Yang L, Wu L, Meng P, et al. Generation of a thrombopoietin-deficient thrombocytopenia model in zebrafish. J Thromb Haemost. 2022; 20(8): 1900-1909. doi:10.1111/jth.15772

      (2) Fallatah W, De Silva IW, Verbeck GF, Jagadeeswaran P. Generation of transgenic zebrafish with 2 populations of RFP- and GFP-labeled thrombocytes: analysis of their lipids. Blood Adv. 2019;3(9):1406-1415. doi:10.1182/bloodadvances.2018023960

      (3) In Figure 4C, the MPV values of each group of mice did not show significant downregulation or upregulation. The possible reasons for this should be explained.

      Response: Thank you for your thoughtful comment. Megakaryocytes build pseudopodia, which form extensions that release proplatelets into the bone marrow sinusoids. Proplatelets convert into barbell-shaped proplatelets to form platelets in an integrin αIIbβIII mediated process (1-2). Platelet size is established by microtubule and actin-myosin-sceptrin cortical forces which determine platelet size during the vascular formation of barbell proplatelets (3). Conversion is regulated by the diameter and thickness of the peripheral microtubule coil. Proplatelets can also be formed from proplatelets in the circulation (4). Megakaryocyte ploidy correlates with platelet volume following a direct nonlinear relationship to mean platelet volumes (5). Usually there is an equilibrium between platelet generation and clearance from the circulation (normal turnover) controlled by thrombopoietin. When healthy humans receive thrombopoietin, their platelet size decreases (6). Proplatelet formation is dynamic and influenced by platelet turnover (7) which increases upon increased platelet consumption and/or sequestration. In our study, the MPV values of each group of mice did not show significant downregulation or upregulation, from our point of view, there are several possible reasons for these results.

      (1) Mice in a radiation-damaged state may result in a decrease in platelet count, but at the same time stimulate the bone marrow to release young and larger platelets, thus keeping the MPV relatively stable.

      (2) After radiation injury, bone marrow cells were suppressed, resulting in a decrease in the number of platelets produced, but MPV remained unchanged, possibly because the direct effects of radiation on the bone marrow caused thrombocytopenia, but not necessarily the average platelet size.

      Reference:

      (1) Thon JN, Italiano JE. Platelet formation. Semin Hematol. 2010(3):220-226. doi: 10.1053/j.seminhematol.2010.03.005.

      (2) Larson MK, Watson SP. Regulation of proplatelet formation and platelet release by integrin alpha IIb beta3. Blood. 2006(5):1509-1514. doi: 10.1182/blood-2005-11-011957.

      (3) Thon JN, Macleod H, Begonja AJ, et al., Microtubule and cortical forces determine platelet size during vascular platelet production. Nat. Commun. 2012(3):852. doi: 10.1038/ncomms1838.

      (4) Machlus KR, Thon JN, Italiano JE Jr. Interpreting the developmental dance of the megakaryocyte: a review of the cellular and molecular processes mediating platelet formation. Br. J. Haematol. 2014(2):227-36. doi: 10.1111/bjh.12758.

      (5) Bessman JD. The relation of megakaryocyte ploidy to platelet volume. Am. J. Hematol. 1984(2):161-170. doi: 10.1002/ajh.2830160208.

      (6) Harker LA, Roskos LK, Marzec UM, et al., Effects of megakaryocyte growth and development factor on platelet production, platelet life span, and platelet function in healthy human volunteers. Blood. 2000(8):2514-2522. doi: 10.1182/blood.V95.8.2514.

      (7) Kowata S, Isogai S, Murai K, et al., Platelet demand modulates the type of intravascular protrusion of megakaryocytes in bone marrow. Thromb. Haemost. 2014(4):743-756. doi: 10.1160/TH14-02-0123.

      (4) The PPI diagram and the KEGG diagram in Figure 6 both provide a possible mechanism pathway for the anti-thrombocytopenia effect of vilazodone. How can the authors analyze the differences in their results?

      Response: We are appreciated your valuable comments. PPI (Protein-Protein Interaction) refers to the interaction between proteins. Inside cells, proteins interact with each other to perform various biological functions, influencing cell signaling, metabolic pathways, cell cycle, and more. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database that integrates information on genomes, chemicals, and biological systems. In pharmacoinformatic, KEGG pathways are often used to understand the molecular mechanisms of specific diseases or biological processes. KEGG contains the interrelationships between genes, proteins, and metabolites, helping to reveal key nodes in biological processes. PPI information can be integrated with data from KEGG pathways, such as metabolic and signaling pathways, to gain a more comprehensive understanding of the role of protein-protein interactions in cellular processes and biological functions. For example, by analyzing nodes in the PPI network, proteins associated with a specific disease can be identified, and further examination of these proteins' locations in KEGG pathways can reveal molecular mechanisms underlying the onset and development of the disease. However, this method also has some limitations:

      Uncertainty (1): The construction of protein-protein interaction networks and drug interaction networks involves many assumptions and speculations. The edges of these networks may be based on experimental data but can also rely on bioinformatics predictions. Therefore, the accuracy of predictions is limited by the quality and reliability of the data used during network construction.

      Insufficient data (2): Despite the availability of a large amount of bioinformatics data for network construction, interactions between some proteins and drugs may still lack sufficient experimental data. This data insufficiency can result in inaccuracies in network predictions.

      Dynamics and temporal-spatial changes (3): The dynamics and temporal-spatial changes in biological systems are crucial for drug effects. Pharmacoinformatic may struggle to capture these changes as it often relies on static network representations, overlooking the temporal and dynamic nature of biological systems.

      Reference:

      (1) Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics. 2020(1):442. doi: 10.1186/s12859-020-03773-2.

      (2) Zhang S, Zhao H, Ng MK. Functional module analysis for gene coexpression networks with network integration. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015(5):1146-1160. doi: 10.1109/TCBB.2015.2396073.

      (3) Cinaglia P, Cannataro M. A method based on temporal embedding for the pairwise alignment of dynamic networks. Entropy (Basel). 2023(4):665. doi: 10.3390/e25040665.

      (5)-HTR1A protein expression is measured only in the Meg-01 cells assay. Similar quantitation through western blot is not shown in other cell models.

      Response: Your insightful criticism and recommendation to use different cell models in order to obtain a more accurate depiction of 5-HTR1A protein expression are greatly appreciated. We completely concur that using this strategy would greatly increase the validity of our research. However, establishing a primary megakaryocyte model requires specialized expertise and technical resources, which unfortunately are not readily available to us within the given timeframe. Nevertheless, we acknowledge the limitations of Meg-01 cells, which may exhibit distinct properties compared to true megakaryocytes. To mitigate this concern, we have ensured robust experimental design and rigorous data analysis to interpret our findings within the context of these model cell lines. We believe our results still provide valuable insights into megakaryocyte differentiation and address an important biological question.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to understand the mechanism of how a drug candidate, VLZ, works on a receptor, 5-HTR1A, by activating the SRC/MAPK pathway to promote the formation of platelets.

      Strengths:

      The authors used both computational and experimental methods. This definitely saves time and funds to find a useful drug candidate and its therapeutic marker in the subfield of platelets reduction in cancer patients. The authors achieved the aim of explaining the mechanism of VLZ in improving thrombocytopenia by using two cell lines and two animal models.

      Weaknesses:

      Only two cell lines, HEL and Meg-01 cells, were evaluated in this study. However, using more cell lines is really depending on the workflow and the grant situations of the current research team.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. We fully agree that CD34+ hematopoietic stem/progenitor cells or primary megakaryocytes would provide a more accurate representation of in vitro megakaryopoiesis compared to HEL and Meg-01 cells, which possess limited potential for this process. We acknowledge that our current study did not include experiments with these preferred cell models. This is because our laboratory is still actively developing the technical expertise and resources required for establishing and maintaining primary megakaryocyte and CD34+ cell cultures. Despite the limitations of the current study, we believe the results using HEL and Meg-01 cells provide valuable preliminary insights into the potential effects of VLZ on megakaryocyte differentiation. We are actively working to overcome these limitations and plan to incorporate these more advanced models in our future investigations.

      Reviewer #1 (Recommendations For The Authors):

      I think the authors can enhance the mechanism study by developing more reliable models and methodologies. The connection to clinical research should be strengthened at the same time.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. Despite the limitations, we are committed to expanding our research in the future by incorporating your suggestion and establishing a primary megakaryocyte model to further validate our findings and strengthen our conclusions. At the same time, we wholeheartedly concur with your suggestion to combine clinical research. Unfortunately, VLZ is not a first-line treatment for depression in China, and getting blood samples from the matching number of patients for analysis is a challenge. To give additional experimental support for the medication, we have attempted to improve the data in vivo as much as feasible, including by implementing the intervention in normal mice. Our findings should also contribute to the theoretical underpinnings of this medication and aid in its practical application.

      Reviewer #2 (Recommendations For The Authors):

      Issues the authors need to address:

      Figure 7: Why the band intensity of GAPDH in b or e is much greater than that in f, g, or h?

      Response: Thank you for your careful observation and insightful comment regarding Figure 7. Because the concentration of each batch of protein samples is different, sometimes the GAPDH band strength is increased by the large loading volume. Other factors that may influence the GAPDH band strength include the instrument's contrast adjustment during exposure and the use of different numbers of holes for electrophoresis. Meanwhile, the original three replicate results of all WB results will be provided in the supplementary materials.

      Finally, we sincerely thank you for providing us with this opportunity to make a further revision and modification of our manuscript, and your valuable and scientific comments are useful for the great improvement of our manuscript!

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We wish to thank the reviewers for the time taken to appraise the manuscript and the helpful feedback to improve it. We have taken onboard the suggested feedback and incorporated it into the revision. The findings of the revised manuscript are unchanged. Below is a point-by-point response to specific comments.

      Public reviews

      Reviewer 1

      Thank you to reviewer 1 for the thorough and insightful review of our manuscript. We are pleased that the strengths of our research, particularly the use of whole-genome bisulfite sequencing, the combination of animal and human data, and the investigation of a potential dietary intervention were recognized. We are confident that these aspects contribute significantly to the value and originality of our work.

      We acknowledge the concerns regarding the statistical rigor of the study, particularly the sample size and data analysis methods. We would like to address these points in more detail:

      Sample size: While we agree that a larger sample size would be ideal, the chosen sample size (n=4 per group) is consistent with other murine whole-genome bisulfite sequencing experiments in the field. We have carefully considered the cost-benefit trade-off in selecting this approach. In the revision we discuss the potential limitations of this sample size.

      Data analysis: We acknowledge the inconsistencies in the study reporting and have committed to improving the clarity in the revision. We carefully reviewed the concerns regarding the use of causal language and the interpretation of differences in our results. In some cases, the use of causal language is justified by the intervention study design. We also believe other explanations like stochastic variation affecting the same genomic regions in different tissues, are exceedingly unlikely from a statistical viewpoint. In the revision we have adopted a balanced approach to the language.

      Confounders: We acknowledge the importance of accounting for potential confounders such as birthweight, alcohol exposure and sex. The pups selected for genome analysis were matched for sex and on litter size as a proxy for in utero alcohol exposure. This careful selection of mice for genome analysis was intentionally guided to mitigate potential confounding.

      Statistical rigour: We acknowledge the importance of multiple testing correction in the genome-wide analysis. We used the DSS method of Feng et al (PMID: 2456180) which employs a two-step procedure for assessing significance of a region. Instead of a single p-value for the whole DMR, DSS uses the area statistic to rank candidate regions and control the false discovery rate through shrinkage estimation methods. This approach reduces the risk of reporting false positives due to multiple testing across numerous CpG sites. It is similar in respects to employing local FDR correction at 0.05 level, with an additional minimum effect size threshold applied, and particularly suited to experiments where the number of replicates is low. In the revision we have committed to improving the clarity of the reporting of statistical methods.

      Reviewer 2

      Thank you to reviewer 2 for the comprehensive and valuable feedback on our manuscript. We take your concerns about the generalizability of our findings and the interpretation of certain results seriously. We would like to address your specific criticisms in detail:

      Generalizability and Human Data: We agree that the generalizability of mouse models to human conditions has limitations. However, our study focused on understanding the early molecular alterations caused by moderate PAE, which can be more effectively modelled in a controlled environment like mice. To clarify this, we have strengthened the manuscript by emphasizing the focus on moderate PAE in the title and throughout the paper.

      Transcriptome Analysis: We recognize the importance of investigating the functional consequences of PAE-induced DMRs and agree that transcriptome analysis would be highly valuable. We are currently planning to conduct future transcriptomic studies to understand the link between DMRs and gene expression.

      Species-Specificity and DMR Enrichment: We acknowledge the likelihood of species-specific PAE effects. Our finding of enrichment of DMRs in non-coding regions was consistent with observations from the Lussier study of FASD. We agree there is further work to do and now highlight this in the discussion.

      Tissue Sample Locations: Due to technical restrictions of processing newborn mouse tissue, we are unable to enhance the manuscript with specific tissue regions sampled.

      Interpretation of Shared Genomic Regions: We appreciate your point about the alternative explanation for the shared genomic regions between brain and liver. Our interpretation is that regions identified in the alcohol group only affected equally in both tissues are likely established stochastically (as a result of the exposure) in the early embryo and then maintained in the germ layers. We have revised to suggest this is the most likely explanation and we acknowledge a more detailed examination in more tissues would be warranted for proof.

      Additional Feedback

      Reviewer 1

      Introduction

      • Line 65 - alcohol consumption is not always preventable and these statements further increase the stigma associated with FASD. A better way to say this would be "a leading cause of neurodevelopmental impairments".

      We have implemented this suggestion in revised manuscript.

      • The studies cited in lines 87-89 are somewhat outdated, as several more recent studies with better sample sizes have been published in recent years. I would recommend citing more recent publications in addition to these studies. Similarly, the authors should also cite Portales-Casamar et al., 2016 (Epigenetic & Chromatin) for the validation in humans, as it was the original study for those data.

      We have added a citation for the study mentioned by Portales-Casamar et al. (2016) in the revised manuscript.

      • Lines 95-95 - the authors should elaborate further on the "encouraging results" from choline supplementation studies, as these details may help interpret the findings from their own study.

      In the revised manuscript, we replaced “encouraging results” with “results suggesting a high methyl donor diet (HMD) could at least partially mitigate the adverse effects of PAE on various behavioural outcomes”.

      • Minor point: DNA methylation is preferable to "methylation" alone when not referring to specific CpGs or sites, as methylation can also refer to protein or RNA methylation.

      “Methylation” has been replaced with “DNA methylation” in revised manuscript

      Results

      • Line 118 - HMD should be defined here.

      HMD defined in revised manuscript

      • The figures in the main manuscript and supplemental materials are not in the same order as they are presented in the text.

      We apologise for this and thank the reviwer for their attendtion to detail. In the revision we have corrected the order of figures to match the text.

      • It is concerning that the H20-HMD group had lower baseline weights, which could impact the findings from these analyses. Please discuss how these differences were accounted for in the study design and analyses.

      We appreciate the reviewer's concern about the lower baseline weight in the H20-HMD group. We agree that this difference could potentially affect our findings. However, we want to emphasize that total weight gain during pregnancy was statistically similar across all groups by linear mixed effect model. Additionally, all dams were within the healthy weight range for their strain. While we cannot completely rule out any potential influence of baseline weight, we believe the similarity in weight gain and the healthy range of all dams suggest that the in-utero experience of pups regarding weight-related factors was likely comparable across groups.

      • I have some concerns regarding the cutoffs used to identify the DMRs, particularly given the small N and number of tests. The authors should report the number of DMRs that meet a multiple testing threshold; if none, they should use a more stringent threshold than p<0.05, as one would expect 950,000 CpGs to meet that threshold by chance (19,000,000 CpGs x 0.05). The authors should also report the number of DMRs tested, as this will be a more appropriate benchmark for their analyses than the number of CpGs (they should also report the specific number here).

      We appreciate the reviewer's concerns regarding the DMR cut-offs. We agree that clarifying the methods and justifying our choices is crucial. Our implementation of the DSS method for defining DMRs employs a local FDR p<0.05 cut-off, with additional delta beta threshold of 5%. We have clarified this in the methods section of the revised manuscript . We want to emphasize that the local FDR approach effectively mitigates the concern of chance findings by adjusting for multiple comparisons across the genome. Line 414-420 in the revised methods contains the following amended text

      “Differentially methylated regions (DMRs) were identified within each tissue using a Bayesian hierarchical model comparing average DNA methylation ratios in each CpG site between PAE and non-PAE mice using the Wald test with smoothing, implemented in the R package DSS (46). False-discovery rate control was achieved through shrinkage estimation methods. We declared DMRs as those with a local FDR P-value < 0.05 based on the p-values of each individual CpG site in the DMR, and minimum mean effect size (delta) of 5%”

      • I also have concerns about the delta cutoff for their DMRs. First, it is not clear if this cutoff is set for a single CpG or across the DMR (even then, it is not clear if this is a mean, median, max, min, etc.) Second, since the authors analyzed CpGs with 10X coverage, they can only reliably detect a delta of 0.1 (1/10 reads).

      Thank you for raising this important point. In the revision we have clarified the effect size cutoff reflects the mean effect across CpGs within the DMR as follows (line 418)

      “We declared DMRs as those with a local FDR P-value < 0.05 based on the p-values of each individual CpG site in the DMR, and minimum mean effect size (delta) of 5%”

      We chose the mean as it provides a comprehensive representation of the overall methylation change within the region, while ensuring all individual CpGs used in the analysis had at least 10x coverage. It is not true that we can only detect a delta of 1/10 reads, the mean effect is the relative difference in means between groups and is not dependent on the underlying sequencing depth.

      • Prenatal alcohol exposure is known to impact cell type proportions in the brain, which could lead to differences in DNAm patterns. The authors should address this possibility in the discussion, as well as examine their list of DMRs to determine if they are associated with specific brain cell types. The possibility of cell type differences in the liver should also be discussed.

      We agree with the reviewer that PAE-induced alterations in cell type proportions can influence DNA methylation patterns. While isolating specific cell types in our current study's brain and liver samples was not achievable due to tissue limitations, we acknowledge this as a limitation and recognize the need for further investigations incorporating single-cell or cell type-specific approaches in the discussion.

      • It is interesting, but maybe not surprising, that more DMRs were identified in the liver compared to the brain. This finding would warrant some additional interpretation in the discussion.

      We appreciate and agree that this finding indeed warrants further interpretation. We have added the following sentence into the discussion section of the revised manuscript that provides some potential factors behind this observation.

      Lines 263 “Indeed, most of the observed effects were tissue-specific, with more perturbations to the epigenome observable in liver tissue, which may reflect the liver’s specific role in metabolic detoxification of alcohol. Alternatively, cell type composition differences between brain and liver might explain differential sensitivity to alcohols effects”.

      • Lines 148-149 - I disagree about the enrichment of decreased DNAm in brain DMRs, as 52.6% is essentially random chance. The authors should also include a statistical test here, such as a chi-squared test, to support this statement.

      We agree that a revised interpretation is warranted. The updated manuscript has been amended as follows: “Lower DNA methylation with early moderate PAE in NC mice was more frequently observed in liver DMRs (93.5% of liver DMRs), while brain DMRs were almost equally divided between lower and higher DNA methylation with early moderate PAE (52.6% of brain DMRs had lower DNA methylation with early moderate PAE).”

      • Similarly, I would recommend the authors use increased/decreased DNAm, rather than hypermethylated/hypomethylation, as the latter terms are better suited to DNAm values near 100% or 0%.

      The use of hyper/hypo methylation is still considered common and well understood even for moderate changes. We agree the use of increased/decreased is more inclusive for a broader audience, so we have amended all references accordingly in the main text.

      • Lines 153-155 - please report the statistics to support these enrichment results. A permutation test would be well suited to this analysis.

      The reporting of statistics related to the enrichment test has now been amended to read “Overlap permutation tests showed liver DMRs were enriched in inter-CpG regions and non-coding intergenic regions (p < 0.05), while being depleted in all CpG regions and genic regions except 1to5kb, 3UTR and 5UTR regions, where there was no significant difference (Figure 2f).”

      • Line 156 - "overwhelming enrichment" is a very strong statement considering the numbers themselves.

      Omitted “overwhelming” in revised manuscript. Revised manuscript states: “Using open chromatin assay and histone modification datasets from the ENCODE project, we found enrichment (p < 0.05) of DMRs in open chromatin regions (ATAC-seq), enhancer regions (H3K4me1), and active gene promoter regions (H3K27ac), in mouse fetal forebrain tissue and fetal liver (Table 2).”

      • Lines 165-167 - Please describe the analyses and metrics used to determine if the DNAm differences were mitigated in the HMD groups. As it stands, it is not clear if they are simply not significant, or if the delta was decreased. In terms of a figure, a scatter plot of the deltas for these DMRs would be better suited to visualizing these changes.

      To determine whether DMRs were mitigated we simply applied the same statistical testing procedure on the subset of PAE DMRs in the group of mice exposed to the HM diet. The sample size is the same, and the burden on multiple testing is reduced as we did not test the entire genome. We believe our interpretation stands although we have urged caution in the discussion as follows (line 319)

      “Another key finding from this study was that HMD mitigated some of the effects of PAE on DNA methylation. Although a plausible alternative explanation is that some of the PAE regions were not reproduced in the set of mice given the folate diet, our data are consistent with preclinical studies of choline supplementation in rodent models (34, 35) (36). Moreover, a subset of PAE regions were statistically replicated in subjects with FASD, suggestive or robust associations. Although our findings should be interpreted with caution, they collectively support the notion that alcohol induced perturbation of epigenetic regulation may occur, at least in part, through disruption of the one-carbon metabolism.”

      • Given the lenient threshold to identify DMRs, it is possible that PAE-associated DMRs are simply false positives and do not "replicate" in a different subset of animals. One way to check this would be to determine whether there are any differences between mitigated/unmitigated DMRs and the strength of their initial associations. Should the mitigated DMRs skew towards higher p-values and lower deltas, one might consider that these findings could be false positives.

      We appreciate the reviewer's concern about potential false positives due to the chosen DMR identification threshold. We reiterate the DMR calling thresholds were adjusted for local FDR; however, we acknowledge the need for further validation. We haven't observed this trend of mitigated DMRs having higher p-values and lower deltas, but we have replicated some PAE DMRs in independent human datasets and found support for their biological plausibility in the context of PAE.

      • Related to the HMD analyses, I am concerned that the EtOH-HMD group consumed less alcohol, which could manifest in the PAE-induced DMRs disappearing, unrelated to the HMD exposure. The authors should comment on whether the pups were matched for ethanol exposure and include sensitivity analyses that include ethanol level as a covariate to confirm that their results are not simply due to decreased alcohol exposure.

      We appreciate the reviewer's concern regarding the lower alcohol consumption by Dams in the EtOH-HMD group and its potential impact on DMRs. We agree that consistent in utero exposure is crucial for reliable results. Our pup selection for genomic analysis involved matching litter size as a proxy for in utero exposure, so even through the average alcohol consumption was lower for the EtOH-HMD group, we matched pups across treatment groups based on litter size as a proxy for alcohol intake levels, excluding pups with significantly different exposure levels. We agree more robust methods including direct measurement of blood alcohol content would improve the study. We have now incorporated this into the discussion of the revised manuscript on lines 351: “Additionally, we employed an ad-libitum alcohol exposure model rather than direct dosing of dams. Although the trajectories of alcohol consumption were not statistically different between groups, this introduces more variability into alcohol exposure patterns, and might might impact offspring methylation data”

      • Lines 172 - please be more specific about the neurocognitive domains tested.

      In the revision we have included more detail about the neurocognitive domains tested (originally mentioned in the results) in the methods as follows:

      “These tests included the open field test (locomotor activity, anxiety) (38), object recognition test (locomotor activity, spatial recognition) (39), object in place test (locomotor activity, spatial recognition) (40), elevated plus maze test (locomotor activity, anxiety) (41), and two trials of the rotarod test (motor coordination, balance) (42)”

      • Line 191 - please report the tissue type used in the human study, as well as the method used to estimate cell type proportions.

      We stated in the results section that buccal swabs were used in both human cohorts.

      We added to the revised manuscript that cell type proportions were estimated using the EpiDISH R package.

      • Related to validation, it is unclear whether the human-identified DMRs were also validated in mice, or if the authors are showing their own DMRs. Please also discuss why DMRs might not have been replicated in AQUA.

      We used human data sets to validate observations from our murine model, focusing on regions identified in our early moderate PAE model. This is now explicitly state on line 209 of the revision:

      “We undertook validation studies by examining PAE sensitive regions identified in our murine model using existing DNA methylation data from human cohorts to address the generalizability of our findings.”

      “In the section entitled ‘Candidate Gene Analysis..’ we used our murine data sets to reproduce previously published associations that included regions identified in both animal and human studies. We posit the lack of replication of our early moderate PAE regions in AQUA is explained in part by species-specific differences and considering the striking differences in effect size seen in regions that did replicate in FASD subjects, the exposure may need to be of sufficient magnitude and duration for the effects seen in brain and liver to survive reprogramming in the blood. The AQUA cohort is largely enriched for low to moderate patterns of alcohol consumption.

      • Line 197 - please provide a citation for the ethanol-sensitive regions. There are also several existing DNAm analyses in brain tissues from animal models that should be included as part of these analyses, as several have shown brain-region and sex-specific DMRs related to prenatal alcohol exposure. These contrasts might help the authors further delineate the effects of prenatal alcohol in their model and expand on current literature to explain the deficits caused by alcohol exposure.

      Our candidate gene/region selection was informed by a systematic review of previously published human and animal studies reporting associations between in utero exposure to PAE and offspring DNA methylation. We synthesized evidence across several models, tissues and methylation platforms to arrive at a core set of reproducible associations. Line 481 of the methods now includes a citation to our systematic review which details our selection criteria.

      Discussion

      • Line 211 - This is a strong statement for one hypothesis. It is also possible that different cell types have similar responses to prenatal alcohol exposure. In this scenario, perturbations need not arise before germ layer separation. The authors should soften this causal statement.

      We appreciate this point although given the genome size relative to the size of the DMRs we have detected, the chance that different cell types would respond similarly in exactly the same regions seems exceedingly rare. We posit a more likely explanation is early perturbations in the embryo are established stochastically as a result of the exposure (supported by the interventional design) and maintained in the differentiating tissues. We agree further work is needed to prove this, specifically in a wider set of tissues from multiple germ layers so we have amended the discussion as follows:

      “These perturbations may have been established stochastically because of alcohol exposure in the early embryo and maintained in the differentiating tissue. Further analysis in different germ layer tissues is required to formally establish this.”

      • Lines 222-224 - I completely agree with this statement. However, the authors had the opportunity to examine dosage effects in their model as they measured alcohol-levels from the dams. At the very least, I would recommend sensitivity analyses in their DMRs to assess whether alcohol level/dosage influences their results.

      Although a great suggestion to improve the manuscript, we did not have opportunity to examine dosages by design as we selected mice for genome analysis with matched exposure patterns. It would be fascinating to conduct a sensitivity analysis.

      Methods:

      • Please include the lysis protocol.

      Thank you for picking up this error in our reporting. We have now included the following details in the methods which improve the reproducibility of this study: “Ten milligrams of tissue were collected from each liver and brain and lysed in Chemagic RNA Tissue10 Kit special H96 extraction buffer”.

      • Please include the total reads for each sample and details of the QC pipeline, including filtering flags, quality metrics, and genome build.

      Thank you for suggesting improvements to our reporting which improve the reproducibility of this study. We have included a new supplementary tableTab of sequencing statistics and details of the quality metrics. Please note the genome build is explicitly stated in the methods already.

      • Please make your code publicly available to ensure that these analyses can be replicated.

      Thank you for this suggestion. A data availability statement has now been included in the revision and code will be made available upon request

      • Why were Y chromosome reads included in the dataset?

      Y chromosomal reads were not included in the DMR analysis. Amended “We filtered the X chromosomal reads” to “We filtered the sex chromosomal reads” in revised manuscript.

      • Please provide the number of total CpGs available for analysis.

      Added sentence into results section of revised manuscript: “A total of 21,842,961 CpG sites were initially available for analysis.” We also clarified that the ~19,000,000 CpGs were analysed following coverage filtering.

      • Please provide the parameters for the DMR analysis and report how the p-values and deltas were calculated.

      We have addressed this in previous comments

      • The supplemental materials for the human data are missing.

      Thank you for picking up this oversight. The revision now includes an additional data supplement which details the analysis of the human data sets for interested readers.

      Tables and figures

      • Table 1. It is not clear how the DMRs for this table were selected. The exact p-values and FDR should also be reported in this table. The number of CpGs in these DMRS should also be reported.

      Table 1 includes select DMRs that were consistently detected in both brain and liver tissue. These are particularly of interest as they represent regions highly sensitive to alcohol exposure. We agree that exact reporting of p-values would be ideal. Instead of a single p-value for the whole DMR, DSS uses the area statistic to rank candidate regions and control the false discovery rate (FDR) through shrinkage estimation methods. In the revision we have now included region size and number of CpGs in table 1.

      • Table 3. Please include p-values for the DMR analyses.

      As above we report the area-statistic which is an equivalent measure to assess evidence for differential methylation.

      • Figure 2 (Figure 4 in revised manuscript). Please report the N for these analyses. It also seems that the pairwise t-tests were only compared to the H20-NC, which does not provide much insight into the PAE group. The relevance of the sexP analysis to the present manuscript is also unclear.

      Figure 2 is now Figure 4 in the revision and the sample size has been included in figure legend. We compared all groups to the control group (H20-NC) as we aimed to determine any differences in intervention groups from the control.

      We apologies for lack of clarity around the ‘sex P’ terminology. This refers to the p-value for the main effect of sex on the behavioural outcome. We agree it lacks relevance since the regression models were adjusted for sex. In the revision we have updated the methods as follows (line426) and removed references to sex P

      “To examine the effect of alcohol exposure on behavioural outcomes we used linear regression with alcohol group (binary) as the main predictor adjusted for diet and sex.”

      • Figure 3ef (Figure 2ef in revised manuscript). It is unclear how the regions random regions were generated. A permutation test would be relevant to determine whether there are any actual enrichment differences.

      As stated in methods section: “DMRs were then tested for enrichment within specific genic and CpG regions of the mouse genome, compared to a randomly generated set of regions in the mouse genome generated with resampleRegions in regioneR, with equivalent means and standard deviations.”

      • Figure 5. Please include the gene names for these DMRs, as well as their genomic locations. It would also be relevant to annotate these plots with the max, min, and mean delta between groups.

      Thank you, we considered this however the DMRs are not in genes so we cannot apply a gene label. The locations are reported on the x-axis and the statistics are shown in Table 3.

      • Figure S1b and S2c- It is quite worrisome that the PAE-HMD group drank less throughout pregnancy than their PAE counterparts. Please discuss how this was addressed in the analyses.

      We appreciate the reviewer's concern regarding the lower alcohol consumption in the PAE-HMD group and its potential impact on DMRs. We agree that consistent in-utero exposure is crucial for reliable results. Although the total amount of liquid consumed over pregnancy was lower in this group, they started with a lower baseline and the trajectory was not statistically different compared to other groups.

      We have now incorporated this into the discussion section of the revised manuscript on lines 336: “Additionally, we employed an ad-libitum alcohol exposure model rather than direct dosing of dams. Although the trajectories of alcohol consumption were not statistically different between groups, this introduces more variability into alcohol exposure patterns, and might might impact offspring methylation data.”

      • Figure S1cd. See my comments about Figure 2.

      Suggested changes have been incorporated.

      • Figure S2d. it is not clear to what the statistics presented in this panel refer. Please clarify and discuss the implications of dietary intake differences on your findings.

      Added sentence to caption in revised manuscript: “Statistical analysis involved linear mixed-effects regression comparing trajectories of treatment groups to H2O-NC baseline control group.”

      • Figure S3. See my comments about Figure 2.

      Suggested changes have been incorporated

      • Figure S4. I am confused by the color legend, as it seems both colors are PAE. I also do not see how any regions show increased or decreased DNAm in PAE based on this plot (also no statistics are presented to support these conclusions).

      The plot is intended to show there are no gross changes in methylation when averaged across all CpGs within different regulatory genomic contexts. Statistics are not included as it is intuitive from the plot that the means are the same. We have updated the figure legend which now reads

      “Figure S4. No evidence for global disruption of methylation by PAE. The figure shows methylation levels averaged across CpGs in different regulatory genomic contexts. Neither brain tissue (A & B), nor liver tissue (C & D) were grossly affected by PAE exposure (blue bars). Bars represent means and standard deviation.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the editor and reviewers’ careful and professional assessment of this manuscript. We are delighted with the reviewers’ instructive comments and suggestions. We have tried to address the raised points comprehensively. The reviewers’ scrutiny has helped us immensely to discuss and present our work extensively and properly. We are grateful for the reviewers’ efforts and insights. The detailed responses are listed here.

      Recommendations for the authors

      (1) The intuition behind the model is not properly explained, i.e., the derivation of Eqs. 1-2 and the biological meaning of the AA/OO logic modes. A different notation could be helpful.

      We thank the reviewers for this comment, and agree that the interpretation of our model in manuscript was indeed in need of improvement. We have incorporated this suggestion into the manuscript. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      In general, considering the diverse audience including those with experimental background, we feel that it is essential to present this manuscript in a more digestible manner. We therefore retain the entire derivation of Eqs. 1-2 in the supplementary method. We have added a qualitative introduction to model derivation and molecular biological significance underlying different logic motifs (AND-AND/OR-OR) in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 161-167 (see below).

      “X and Y are TFs in the CIS network. n1 and n2 are the coefficients of molecular cooperation. k1-k3 in Eq1 and k4-k6 in Ep2 represent the relative probabilities for possible configurations of binding of TFs and CREs. (Fig2.A). d1 and d2 are degradation rates of X and Y, respectively. Here, we considered a total of four CRE’s configurations as shown in Figure 2A (i.e., TFs bind to the corresponding CREs or not, 22=4). Accordingly, depending on the transcription rates (i.e., r0x, r1, r2, r3 in Eq1, similarly in Eq2) of each configuration, we can model the dynamics of TFs in the Shea-Ackers formalism[1, 2].

      Thus, the distinct logic operations (AND/OR) of two inputs (e.g., activation by X itself and inhibition by Y) can be further implemented by assigning corresponding profile of transcription rates in four configurations (Fig2.A). From the perspective of molecular biology, the regulatory logics embody the complicated nature of TF regulation that TFs function in a context-dependent manner. Considering the CIS network, when X and Y bind respective CREs concurrently, whether the expression of target gene is turned on or off depends on the different regulatory logics (specifically, off in the AND logic and on in the OR logic; Fig2.A). Notably, instead of exploring the different logics of one certain gene[3, 4], we focus on different combinations of regulatory logics due to dynamics in cell fate decisions is generally orchestrated by GRN with multiple TFs.”

      (2) More clearly specify the used parameters and how these are chosen. This would be helpful to get a more quantitative grasp of the conditions that they compare.

      We appreciate the reviewers pointing out unspecified parts in the main text. We have now included related discussion in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”).

      We would like to highlight that the Boolean models with different logic motifs (Fig. 2B) explicitly display the difference of state spaces (i.e., attractor basin). Moreover, as the focus of this work is on the role of regulatory logics in cell fate decisions, we ponder that it is rational to specify the geometry of the landscape based on the hint from Boolean models. Therefore, we reason that it is intuitive and reliable to assign values to used parameters by mapping our ODE models (Eqs. 1-2) to corresponding Boolean models qualitatively (refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”). In producing Figure 2-5, setting of parameters was performed in a heuristic way without particular searching. However, to draw general conclusions, like the "trade-offs between progression and accuracy" and the presence of the fully-connected stage, we sampled a substantial number of sets parameters to ensure statistically robust findings.

      (3) Include the explanation of how the nullclines and basins shown in the figures (e.g., Fig. 2C, Fig. 4C, Fig. 4F, etc.) are calculated.

      We thank the reviewers for this suggestion. We have incorporated this into the legend of corresponding figures when first mentioned in the main text. Please refer to Page 7 of the revised manuscript, lines 217-223 (see below).

      “Fig2.C:

      (C) State spaces of the AND-AND (top panel) and OR-OR (bottom panel) motifs in ODE models. Dark and red lines represent nullclines of respectively. Stable steady states (SSS) are denoted as orange dots. Unstable Steady States (USSs) are denoted as white dots. Each axis represents the concentration of each transcription factor, which units are arbitrary. Blue, green and purple areas in state spaces indicate attractor basins representing LX, S and LY, respectively. Color of each point in state space was assigned by the attractors they finally enter according to the deterministic models (Eq1, Eq2). These annotations were used for the following Figure 3-7.”

      (4) Clarity on the decisions in the work is needed. For example, the "introduction" of asymmetry of the noise levels (as stated in line 215) appears completely arbitrary. The reason behind it can be guessed in the following paragraph, but the reader shouldn't have to guess.

      We agree entirely with the reviewers’ comment. Indeed, this should have been stated more explicitly. The motivation for incorporating asymmetry in the noise levels stems from our endeavor to mimic the inherent biological variability in gene expression within a cell population. We have adjusted the manuscript to better convey the motivation for investigating asymmetric noise level. Please refer to Page 8 of the revised manuscript, lines 237-238 (“In biological systems, it is unlikely that the noise level of different genes is kept perfectly the same.”).

      (5) Arbitrary and/or out-of-context jargon is used throughout the manuscript, making it hard to read and follow what the authors mean in some cases. For example, "temporal fully-connected stage" is used for the first time in line 290, and the term is not explained either in the main text or in the manuscript. Similarly, the reference to a Boolean-like and Boolean model (line 163 and Figure 1) without clarifying if this is just an analogy or if a formal model is built, nor the utility and implications of this comparison. Another problem related to jargon occurs on line 291, where the authors talk about "parameter sensibility", but such analysis (as it is normally understood in the field) is never performed; the authors perform a parameter exploration and make some general conclusions about the parameter space, but that is different than a parameter sensitivity analysis.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      Regarding the jargon of "temporal fully-connected stage", we realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 11 of the revised manuscript, lines 323.

      We thank the reviewers for pointing out the lack of clarity concerning the Boolean models. We have now amended the manuscript to make this implicit expression explicit. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). Specifically, we employed the Boolean models (Fig.2B) as the reference to assist us to heuristically evaluate the applicability of used parameters in the ODE models. Therefore, the Boolean models are built formally, and corresponding updated rules are listed in Fig.2A (refer to the middle row in the table called “Logic Function”, now also noted in the legend of Fig.2B, Page 7, lines 213-214). Nevertheless, we do utilize the analogy between the attractor basins from Boolean models and ODE models (refer to Fig.2B-C). Accordingly, we used the term “Boolean-like” to describe the landscape presented by the continuous models (Eqs. 1-2; refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”).

      We appreciate the reviewers for this valuable comment, and agree that the usage of “parameter sensibility” was in need of adjustment. We have now amended the manuscript. Please refer to Page 10 of the revised manuscript, lines 318-321 (see below).

      “To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif).”

      (6) Probably related just to the language clarity (i.e., the abuse of jargon), but we don't understand the conclusion on lines 296-298.

      We thank the reviewers for this comment. We have adjusted the manuscript accordingly. Please refer to Page 11 of the revised manuscript, lines 323-327 (see below). And we hope that the reviewers agree with our attempt at mapping into the particular stage in cell fate decisions from the point of landscape.

      “Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      (7) The so-called "solution landscape" in Figure 4E needs to be better explained.

      We thank the reviewers for this comment. We have introduced the concept of solution landscape, which is a pathway map consisting of all stationary points and their connections, in lines 196-198 of the revised manuscript (see below).

      “Furthermore, we introduced the solution landscape method. Solution landscape is a pathway map consisting of all stationary points and their connections, which can describe different cell states and transfer paths of them [82-84].”

      In Figure 4E, we added detailed explanation of the solution landscape for the AND-AND motif. Specifically, it describes a hierarchical structure including one 2-saddle (yellow triangle), three 1-saddles (crimson X-cross sign), and three attractors (green dot). The layer of 1-saddles is represented by a blue translucent plane, and the bottom layer is the flow field diagram. The connections from 2-saddle to 1-saddles and from 1-saddles to the attractors are represented by red and blue lines, respectively. The arrow and color of the heatmap correspond to the flow direction and the length of the acceleration at each point in the state space.

      (8) Table S1 is not properly annotated, and then it is impossible to interpret how it supports the observations in the paragraph in lines 342-342.

      We appreciate the reviewers’ useful feedback. We have refined the annotations of all tables in our manuscript (Table S1-3). Please refer to “Supplementary Table” in resubmitted files.

      Specifically, we randomly collected 6,231 sets of parameters for the AND-AND motif and 6,682 sets for the OR-OR motif (k1-k6 in Eq1 and Eq2; refer to Page 6 of the revised supplementary method, see below).

      “First, to collect parameter sets with 3 SSSs, we used Latin hypercube sampling (LHS) to screen k-series parameters symmetrically (i.e., k1 = k4, k2 = k5, k3 = k6) ranging from 0.001 to 5 both in the AND-AND and OR-OR motifs. We ultimately collected 6,231 sets for the AND-AND motif and 6,682 sets for the OR-OR motifs (Table S1).”

      To analyze the sequence of vanishing SSSs, we further filtered parameter sets with 2 SSSs remained as increasing ux (corresponding to Eq3 in the revised manuscript, Page 10, lines 293). We then got a collection of 6,207 sets for the AND-AND motif and 6,634 sets for the OR-OR motif. Based on these parameter settings, we checked if the observations (refer to Page 13, lines 377-378, “The distinct sequences of attractor basin disappearance as ux increasing can be viewed as a trade-off between progression and accuracy.”) are artifacts of particular parameter choice.

      (9) The flow in Section 5 needs to be reorganised. For instance, it is not clear which question the authors are addressing in line 395, or how the proposed approach answers the question stated in lines 381-382.

      We greatly thank the reviewers for pointing this out, and acknowledge that the Section 5 was definitely in need of improvement. We have now amended the manuscript to make this implicit understanding explicit. Please refer to Page 15 of the revised manuscript, lines 426-430 (see below).

      “In prior sections, we systematically investigated two logic motifs under the noise- and signal-driven modes in silico. With various combinations of logic motifs and driving forces, features about fate-decision behaviors were characterized by computational models. Next, we questioned whether observations in computation can be mapped into real biological systems. And how to discern different logic motifs and driving modes is a prerequisite for answering this question.

      To end this, we first evaluated the performance of different models, specifically in simulating the process of stem cells differentiating towards LX (Fig6.A).”

      (10) There are two important weak points for the successful classification of the regulatory logic of real gene expression data as presented in the manuscript: (1) the small number of time-points in the datasets and clear peaks in gene expression heterogeneity cannot be identified, and (2) it is not always clear whether cell differentiation really exclusively relies on a CIS network, and which genes constitute it. These limitations should be solved or at least discussed in the manuscript.

      We thank the reviewer for this comment. First, we agree entirely that analysis of datasets with more time points will be more amenable to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). We have also extended the discussion to include above points to explicitly note the limitations regarding the used datasets. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      In regards to second point, we do acknowledge that the CIS network may not always be the core module for every fate-decision case (but to our knowledge, this can be assumed in many cases, especially in binary tree-like pattern). For applicability and potential relevance to our intended readership, we developed the models and draw our conclusions primarily based on the CIS topology for its representativeness. We intend to incorporate diverse topologies (like mutual activation with self-activation, Feed-Forward Loop, etc.) in our computational framework presented here in near future. Additionally, we have incorporated this point into the discussion in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 766-769 (see below).

      “Notwithstanding the fact that the CIS network is prevalent in fate-decision programs, there are other topologies of networks that serve important roles in the cell-state transitions, like feed-forward loop, etc. The framework presented in this work should further incorporate diverse network motifs in the future.”

      As referred by the reviewers, even if given the CIS network, we may not sure about which genes constitute it in some cases. We agree that further extension of our framework to mining key regulators is an interesting question. We also note that we have become very enthusiastic about recent work that shows how to nominate core factors from high-throughput data[8, 9]. Of note, in the last section of our manuscript titled “The chemical-induced reprogramming of human erythroblasts (EBs) to induced megakaryocytes (iMKs) is the signal-driven fate decisions with an OR-OR-like motif”, we leveraged patterns of temporal expression variance to filter out key regulators (Fig7.F and H). We thus underline the potential of mining genes comprising core GRN circuits through expression variance. Nevertheless, as the focus of the present paper is on the role of regulatory logic in cell fate decisions, we feel it is beyond the scope of the present article to continue the development of our results on this point. Instead, we have included discussion of case that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (11) The models used in Figure S5 are never clearly described.

      We thank the reviewers for pointing this out. We have now introduced the settings of the models used in Figure S5 more clearly in the legend (see below).

      Two logic motifs with the noise-driven mode (FigS5.A, see below):

      Author response image 1.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Simulation was preformed 1000 times for each pseudo-time point, with each temporal state (from left to right) recorded as a dot on the plot. Top panel: Noise level of X (σx) is set to 0.21, and σy is 0.09. Bottom panel: Noise level of Y (σy) is set to 0.21, and σx is 0.09. Red arrow represents the direction of fate transitions of S to LX. Other than adding a white noise, parameters were identical with those in Figure 2C.”

      Two logic motifs with the signal-driven mode (FigS5.B, see below):

      Author response image 2.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.06. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.09 (0, 0.045, 0.09, from left to right). Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.05. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.24 (0, 0.12, 0.24, from left to right). Red arrow represents the direction of fate transitions of S to LX. Other model’s parameters were identical with those in Figure 2C.”

      (12) Up until Section 5, "noise levels" have been used to refer to an input/parameter in the model. Here it is assumed as an emergent property. Are the authors talking about the variance in expression (e.g., see line 398)? Is it defined as the coefficient of variation? Clarity is essential to interpret the observations in this section, e.g., "different driving modes change in the patterns of noise rather than expression levels" (lines 399-400).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation. For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly, and hope this revision will be helpful for interpreting our result. Please refer to Page 15 of the revised manuscript.

      (13) "Pulse-like behaviour" is used in an arbitrary way, not as it is normally used in the field. Moreover, we consider this jargon expression does not contribute to the understanding of the paper. (The authors probably meant "discrete transitions" vs "gradual transitions".)

      We appreciate the reviewers’ valuable feedback regarding our use of the term “Pulse-like behavior”. We agree with the reviewers’ statement, and acknowledge that terminology of noise level’s patterns between different driving modes (noise-driven vs signal-driven; refer to Section 5 in our manuscript) was in need of improvement.

      Upon comprehensive consideration, we primarily decided to adopt the terms “monotonic transitions” and “nonmonotonic transitions” to recapitulate the trends of noise level, underlining the distinct temporal noise’s patterns in cell fate decisions brought by two driving forces in a more contrastive way. We anticipate that current jargon expressions will be beneficial for interpreting our work. Please refer to Page 15 of the revised manuscript.

      (14) The temporal resolution of the scRNAseq datasets that the authors used is too low to unambiguously distinguish a discrete pattern of gene expression heterogeneity from a rising profile. This limitation needs to be at least acknowledged in the text. Alternatively, the authors might want to identify more recent datasets with higher time resolution.

      We appreciate the reviewers’ insightful suggestions. We agree that analysis of datasets with higher time resolution will be more unambiguous to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). Nevertheless, we recognize this limitation should be mentioned in the paper. So, we have also extended the discussion to include above points. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      (15) In the case of embryonic stem cell differentiation, an additional complication is that this protocol yields heterogeneous cell type mixtures, whereas the authors' simulations usually are designed to give differentiation towards a single cell type. This difference makes it difficult to compare measures of gene expression heterogeneity between simulations and the experimental system to infer regulatory logic questionable.

      We thank the reviewers for this valuable comment and realize that we were not clear enough in the manuscript regarding the case of embryogenesis. In the biological system devised by Semrau et al[10], mouse embryonic stem cells (mESCs) differentiates into two lineages simultaneously, just as mentioned by the reviewers. We noticed this additional complication and performed other simulations in two logic motifs with increasing noise level of gene X and Y, as presented in Fig.S6E (see below).

      Author response image 3.

      “(E) Time courses on the coefficient of variation in expression levels of X and Y genes in silico during differentiation under the noise-driven mode. Initial values were set to the attractors of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.14. Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.1. Stochastic simulation was preformed 1000 times for each pseudo-time point.”

      Given the noise-driven mode, we further employed the expression pattern of Gbx2-Tbx3 circuit to heuristically infer the logic motif.

      (16) In contrast to the hematopoiesis example, the authors do not focus on a specific gene regulatory circuit with the ESC dataset. How their approach is possible on genome-wide data needs to be discussed.

      We thank the reviewers for this comment. Indeed, the core GRN orchestrating the fate-decision process reported by Semrau et al[10] is not fully elucidated. We here focus on the Gbx2-Tbx3 circuit (Fig.6H, Fig.S6D). These two TFs were filtered out from 22 candidate TFs and suggested as potential key regulators in the original paper[10]. Accordingly, at this point we followed the original paper’s statement.

      In regards to extension into biological systems without specific gene regulatory circuits, we have included discussions about the possibility that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (17) [In supplemental material, pp.1] Possible typo: "In our word, we considered a GRN comprised...".

      Thanks for spotting this typo. We have amended it in the revised supplemental method (refer to Page 1 of the revised supplementary method).

      (18) [In supplemental material, pp.1] In Eqs. (1), the notation for the function HX([X]) implies that HX only depends on X, leaving the combinatorial regulation out. HX([X],[Y]) would be more general and accurate.

      Thanks for pointing this out. We have incorporated this suggestion into the manuscript. Please refer to Page 1 of the revised supplementary method.

      (19) [In supplemental material, pp.1] There are several works that have shown that the Hill coefficient is rarely representative of the number of binding elements. The model can be more general. See, for example, «Santillán, Moisés. "On the Use of the Hill Functions in Mathematical Models of Gene Regulatory Networks." Mathematical Modelling of Natural Phenomena 3, no. 2 (October 22, 2008): 85-97. https://doi.org/10.1051/mmnp:2008056.» and «Nam, Kee-Myoung, Rosa Martinez-Corral, and Jeremy Gunawardena. "The Linear Framework: Using Graph Theory to Reveal the Algebra and Thermodynamics of Biomolecular Systems." Interface Focus 12, no. 4 (June 10, 2022): 20220013. https://doi.org/10.1098/rsfs.2022.0013.»;

      We thank the reviewer for drawing our attention to this and highlighting the above works. Indeed, this is important information to include in the manuscript. We have incorporated this suggestion into the revised supplemental method (refer to Page 1 of the revised supplementary method). These references have now been included in the revised supplemental method (refer to references [2]-[3]).

      (20) [Minor] The configuration labels can be confusing, especially the AA, which is rather an AND NOT gate.

      We thank the reviewers for this comment. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      (21) [Minor] Very low printing quality in Figure 1.

      Thanks for the feedback regarding the printing quality of Figure 1. We have made the necessary adjustments to improve its quality. We have also ensured that all other figures in the manuscript meet the required standards.

      (22) [Minor] We suggest including a quantitative scale for the bias in Fig. 3E.

      Thanks, we have incorporated this suggestion into the manuscript.

      (23) [Recommendation] Authors could also evaluate the cell fate decision processes as mutations or other perturbations affect a regulatory network.

      We appreciate the reviewers for this valuable recommendation. We agree with the reviewers that further involving new cases would be helpful, especially those mutation-driven disease-related fate-decision processes, such as neutropenia in chemotherapy. However, given the considerable effort towards searching for appropriate datasets, we carefully decide not to make this change.

      (24) [Recommendation] The authors could include some discussion of the likely impact of the work on the field and the utility of the methods and data to the community. For example, understanding the fluidity of the epigenetic landscape and the regulatory forces behind cell fate decisions can be of great importance in designing synthetic gene regulatory circuits.

      We greatly appreciate the reviewers pointing this out. In the original manuscript, we intentionally limited the length of the discussion to make the whole story more focus. We thank the reviewers for their insightful suggestions regarding the content of discussion. We have incorporated this suggestion into the revised manuscript. Please refer to Page 25, lines 751-757 (see below).

      “Recently, synthetic biology has realized the insertion of the CIS network in mammalian cells. One of the prerequisites for recapitulating the complex dynamics of fate transitions in synthetic biology is systematical understanding of the role of GRNs and driving forces in differentiation. And the logic motifs are the essential and indispensable elements in GRNs. Our work also provides a blueprint for designing logic motifs with particular functions. We are also interested in validating the conclusions drawn from our models in a synthetic biology system.”

      In addition, a longstanding question of our interest in cell fate decisions is what contributes the distinctive development cross species, like human, mice and so on forth. However, in addition to protein coding sequences, regulatory interactions between genes (i.e., activation and inhibition) also exhibit conservation as reported in recent work of multi-species cell atlas [11], and it is generally acknowledged that gene regulatory networks (GRNs) orchestrate fate-decision procedures. Namely, conserved regulatory programs further bring us a conserved topology of core GRNs. Thus, the logics of regulation, as another vital element in GRNs, is naturally under the spot light (related to the introduction, lines 99-120 of the revised manuscript). Nevertheless, to our knowledge, regulatory logic in cell fate decisions has received only scant attention. We hope that our elucidation of the role of logic motifs in cell fate decisions will attract more inquiries in community into GRN’s regulatory logic.

      Public reviews

      In this manuscript, Xue and colleagues investigate the fundamental aspects of cellular fate decisions and differentiation, focusing on the dynamic behaviour of gene regulatory networks. It explores the debate between static (noise-driven) and dynamic (signal-driven) perspectives within Waddington's epigenetic landscape, highlighting the essential role of gene regulatory networks in this process. The authors propose an integrated analysis of fate-decision modes and gene regulatory networks, using the Cross-Inhibition with Self-activation (CIS) network as a model. Through mathematical modelling, they differentiate two logic modes and their effect on cell fate decisions: requires both the presence of an activator and absence of a repressor (AA configuration) with one where transcription occurs as long the repressor is not the only species on the promoter (OO configuration).

      The authors establish a relationship between noise profiles, logic-motifs, and fate-decision modes, showing that defining any two of these properties allows the inference of the third. They also identify, under the signal-driven mode, two fundamental patterns of cell fate decisions: either prioritising progression or accuracy in the differentiation process. The authors apply this analysis to available high-throughput datasets of cell fate decisions in hematopoiesis and embryogenesis, proposing the underlying driving force in each case and utilising the observed noise patterns to nominate key regulators.

      The paper makes a substantial contribution by rigorously evaluating assumptions in gene regulatory network modelling. Notably, it extensively compares two model configurations based on different integration logic, illuminating the consequences of these assumptions in a clear, understandable manner. The practical simulation results effectively bridge theoretical models with real biological systems, adding relevance to the study's insights. With its potential to enhance our understanding of gene regulatory networks across biological processes, the paper holds promise. Its implications extend practically to synthetic circuit design, impacting biotechnology. The conclusions stand out, addressing cell fate decisions and noise's role in gene networks, contributing significantly to our understanding. Moreover, the adaptable approach proposed offers versatility for broader applications in diverse scenarios, solidifying its relevance beyond its current scope.

      We thank the reviewers for their enthusiasm for our work, and appreciate the professional, insightful and encouraging assessment.

      However, the manuscript in its current form also has some important weaknesses, including the lack of clarity in the text and the questionable generality of specific observations.

      We thank the reviewers for this comment. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      For instance, even when focusing on the CIS network, the effect of alternative model implementations is not discussed. Notably, the input signals are only considered as an additive effect over the differential equations, while signals can potentially affect each of the individual processes.

      We agree with the reviewers’ comment that signals may affect at each level of the central dogma, including transcription, translation, etc. Further, we have also included additional section titled “limitation of this study” on this point in the revised manuscript, and explicitly point to the potential limitations of our models. Please refer to Page 25 of the revised manuscript, lines 769-771 (see below).

      “In addition, for simplicity and intuition, we here considered signals as uncoupled and additive effects in ODE models, due to feasible mapping in real biological systems, such as ectopic overexpression.”

      The proposed model allows for a continuum of interactions/competition between transcription factors, yet only very restrictive scenarios are explored (strict AND/OR logic operations).

      We thank the reviewers for this comment, and appreciate them sharing the potential for further generalization of our framework. Indeed, in addition to logic operations, our framework is able to be applied to all two-node circuits (34=81 in total), including mutual activation with self-activation. As the focus of this work is to illustrate the role of logic motifs in cell fate decisions, we mainly concentrated on two classical, intuitive and representative (at least to us) logic operations AND/OR in the context of the CIS network. Nonetheless, we already have four combinations to consider (two logic motifs and two driving forces). And we feel that the currently involved scenarios have properly fulfilled our need to manifest the role of logic motifs. Hence, we carefully decided not to further explore more logic operations in this work. Instead, we have included additional section titled “limitation of this study” in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 760-762.

      “Although our framework enables the investigation of more logic motifs, we chose two classical and symmetrical logic combinations for our analysis. Future work should involve more logic gates like XOR and explore asymmetrical logic motifs like AND-OR.”

      Moreover, how the model parameters are chosen throughout the paper is not clear. Similarly, the concentration and times are not clearly specified, making their comparison to experimental data troublesome.

      We thank the reviewers for this comment. Regarding how to specify parameters in our model, we have now revised the manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). In terms of concentration and time, we acknowledge that their units are arbitrary compared to a real experimental system. We now have noted this point in the legend of corresponding figures (Fig2.C, Fig3.B&D, Fig6.B-C, Fig7.E).

      We would like to highlight that our entire work is organized in a model-driven fashion (also called top-down). We did not fine-tune the sets of parameters used in our model to specifically match the experimental data. Actually, it is also a longstanding challenge in computational biology since experimental datasets are usually insufficient to specify the parameters in a dynamical model. So, in general, it is inevitable to involve more assumptions such as non-Markov process[12, 13] and may lead to artifacts. Thus, we decided to draw qualitative conclusions (e.g., trends over time) from a quantitative model with sampling of parameter sets. Hence, we did not intentionally tailor our models to fit different datasets (i.e., all models used in our work share same basic setting of parameters), mapping into real biological systems in a top-down manner.

      Regarding clarity, how the general model (equations 1-2) transforms into the specific cases evaluated in the paper is not clearly stated in the main text, nor are the positive and negative effects of individual transcription factors adequately explained. Similarly, in the main text and Figure 2, the authors refer to a Boolean model. However, they do not clearly explain how this relates to the differential equation model, nor its relevance to understanding the paper.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have adjusted the manuscript accordingly and made the necessary adjustments to improve its clarity.

      Additionally, the term "noise levels" is generally used to refer to noise introduced in the "noise-driven" analysis (i.e., as an input or parameter in the models). Nonetheless, it is later claimed to be evaluated as an intrinsic property of the network (likely referring to expression level variability measured by the coefficient of variation).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation.

      For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly.

      Finally, some jargon is introduced without sufficient context about its meaning (e.g., "temporal fully-connected stage").

      Regarding the jargon of "temporal fully-connected stage", we have realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 10-11 of the revised manuscript, lines 316-327 (see below).

      “Notably, in the AND-AND motif we observed a brief intermediated stage before S attractor disappears, where all three fates are directly interconnected (Fig4.C 2nd panel and D 2nd panel, Fig.4E). To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif). Unlike the indirect attractor adjacency structure mediated by S attractor (Fig2.D), the solution landscape with fully-connected structure facilitates transitions between any two pairs of fates. Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      Additionally, proper discussion of previous work is also missing. For instance, the dynamics of the CIS network investigated by the authors have been extensively characterised (see e.g., Huang et al., Dev Biol, 2007), and how the author's results compare to this previous work should be discussed. In particular, the central assumptions behind the derivation of the model proposed in the manuscript must be assessed in the context of previous work.

      Thanks for pointing this out. We have extended the discussion to include above points. We have also discussed and cited the work of Huang mentioned above. Please refer to Page 22, lines 644-647 in the revised manuscript (see below).

      “One of the most representative work is that Huang et al. [14] modeled the bifurcation in hematopoiesis to reveal the lineage commitment quantitatively. Compared to simply modularizing activation or inhibition effect by employing Hill function in previous work, our models reconsidered the multiple regulations from the level of TF-CRE binding.”

      References

      (1) Ackers, G.K., A.D. Johnson, and M.A. Shea, Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci U S A, 1982. 79(4): p. 1129.

      (2) Shea, M.A. and G.K. Ackers, The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. Journal of Molecular Biology, 1985. 181(2): p. 211-230.

      (3) Hunziker, A., et al., Genetic flexibility of regulatory networks. Proc Natl Acad Sci U S A, 2010. 107(29): p. 12998-3003.

      (4) Kittisopikul, M. and G.M. Suel, Biological role of noise encoded in a genetic network motif. Proc Natl Acad Sci U S A, 2010. 107(30): p. 13300-5.

      (5) Brand, M. and E. Morrissey, Single-cell fate decisions of bipotential hematopoietic progenitors. Curr Opin Hematol, 2020. 27(4): p. 232-240.

      (6) Zhang, Y., et al., Hematopoietic Hierarchy - An Updated Roadmap. Trends Cell Biol, 2018. 28(12): p. 976-986.

      (7) Arinobu, Y., et al., Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages. Cell Stem Cell, 2007. 1(4): p. 416-27.

      (8)Kamimoto, K., et al., Dissecting cell identity via network inference and in silico gene perturbation. Nature, 2023. 614(7949): p. 742-751.

      (9) Hammelman, J., et al., Ranking reprogramming factors for cell differentiation. Nat Methods, 2022. 19(7): p. 812-822.

      (10) Semrau, S., et al., Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat Commun, 2017. 8(1): p. 1096.

      (11) Li, J., et al., Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics, 2022. 54(11): p. 1711-1720.

      (12) Stumpf, P.S., F. Arai, and B.D. MacArthur, Modeling Stem Cell Fates using Non-Markov Processes. Cell Stem Cell, 2021. 28(2): p. 187-190.

      (13) Stumpf, P.S., et al., Stem Cell Differentiation as a Non-Markov Stochastic Process. Cell Syst, 2017. 5(3): p. 268-282 e7.

      (14) Huang, S., et al., Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol, 2007. 305(2): p. 695-713.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents some valuable information regarding the molecular mechanisms controlling the regeneration of pancreatic beta cells following induced cell ablation. However, the study lacks the critical lineage tracing result to support the conclusion about the origin of the regenerated beta cells. The results of the pharmacological manipulation of CaN signaling are also incomplete. In particular, these manipulation are not cell-specific, making it difficult to interpret and thus genetic approach is recommended.

      Public Reviews:

      Reviewer #1 (Public Review):

      Induction of beta cell regeneration is a promising approach for the treatment of diabetes. In this study, Massoz et.al., identified calcineurin (CaN) as a new potential modulator of beta cell regeneration by using zebrafish as model. They also showed that calcineurin (CaN) works together with Notch signaling calcineurin (CaN) to promote the beta cell regeneration. Overall, the paper is well organized, and technically sound. However, some evidence seems weak to get the conclusion.

      Reviewer #2 (Public Review):

      This work started with transcriptomic profiling of ductal cells to identify the upregulation of calcineurin in the zebrafish after beta-cell ablation. By suppressing calcineurin with its chemical inhibitor cyclosporin A and expressing a constitutively active form of calcineurin ubiquitously or specifically in ductal cells, the authors found that inhibited calcineurin activity promoted beta-cell regeneration transiently while ectopic calcineurin activity hindered beta-cell regeneration in the pancreatic tail. They also showed similar effects in the basal state but only when it was within a particular permissive window of Notch activity. To further investigate the roles of calcineurin in the ductal cells, the authors demonstrated that calcineurin inhibition additionally induced the proliferation of the ductal cells in the regenerative context or under a limited level of Notch activity. Interestingly, the enhanced proliferation was followed by a depletion of ductal cells, suggesting that calcineurin inhibition would exhaust the ductal cells. Based on the data, the authors proposed a very attractive and intriguing model of the role of calcineurin in maintaining the balance of the progenitor proliferation and the endocrine differentiation. However, the conclusions of this paper are only partially supported by the data as some evidence from the data remains suggestive.

      (1) In the transcriptomic profiling, genes differentially regulated in the ablated adults could be solely due to the chemical effects of metronidazole instead of the beta-cell ablation. A control group without ins:NTR-mCherry but treated with metronidazole is necessary to exclude the side effects of metronidazole.

      We believe that it is unlikely that the differential regulation observed is due to metronidazole rather than the beta cell loss. This experimental strategy as proven successful in well-published studies to identify regulators of beta cell regeneration in the zebrafish larvae. Importantly, the candidates identified in these studies were subsequently functionally validated in mammalian models (Lu et al. 2016, Karampelias 2021). Moreover, in our study, we also used another chemical compound, the nifurpirinol (Bergemann et al., 2018), to ablate the beta cells. Regardless of whether we employed metronidazole or nifurpirinol for beta cell ablation, our results consistently indicate a notable involvement of calcineurin. Of note, the nifurpirinol molecule is commonly used in fishkeeping without toxicity reported on the global health of the fish.

      (2) Although it has been shown that the pancreatic duct is a major source of the secondary islets in the pancreatic tail in previous studies, there is no direct evidence showing the cyclosporin A-induced cells share the source in this manuscript. Without any proper lineage tracing work, the origin of those cyclosporin A-induced cells cannot be concluded.

      Our experimental setting is similar to the one described in Ninov et al. 2013, where lineage tracing experiments demonstrate an increase of beta cell formation in the pancreatic tail that originate from the pancreatic ducts. In our study, we performed the same experiment with the addition of CsA and showed more ductal cell proliferation (Figure 5G) followed by a 19% increase of beta cell regeneration compared to nonregenerative conditions (Figure 2B). It is unlikely that the additional 19% of regenerated beta cells under CaN inhibition come from another source than the 68% first.

      On the other hand, the acinar cells cannot be consider as another source of regenerated beta cell as they are not able to form beta cells unless they are artificially reprogrammed (Maddison et al., 2012). Therefore the only other potential source of regenerated beta cell is the endocrine compartment. However at the stage where we performed beta cell ablation, there are no endocrine cell in the pancreatic tail. Moreover, there are no evidence that secondary islets could come from the principal islet, they are tightly associated with the ducts and differentiate form ductal cell (Mi et al., 2023).

      Importantly, we demonstrated that overexpression of CaN specifically in the pancreatic ducts prevents beta cell regeneration. CaN effect is therefore intrinsic to the ducts. Moreover, we showed that CsA increase beta cells formation when Notch signalling is repressed. Given that Notch signalling is known to act on the ductal cell population, this strongly suggests again that CsA exacerbate beta cells formation from the ducts.

      All of these compelling evidences strongly support the notion that the cyclosporininduced beta cells originate from the ductal cells.

      (3) It is interesting to see an increase of beta cells in the primary islet after cyclosporin A treatment (Supplemental Fig 2B). However, it remains unclear if their formation shares the same mechanism with the newly formed beta cells in the pancreatic tail.

      There are indeed several source of beta cell regeneration in the primary islet. However, a recent study showed that the contribution of alpha cell to regeneration is minor and the main contributors are ductal and sst1.1 cells (Mi et al., 2023). In our previous publication, we indeed showed that a major source of beta cell in the principal islet is the delta 1.1 cell population. Those sst1.1 cells begin to express insulin and therefore are named ‘bihormonal’ (Carril et al., 2022). We tested if this population is impacted by CsA treatment and we showed below that CsA does not affect bi-hormonal cell formation (Figure 2D supplemental). These new results suggest that the CsA mediated increase of beta cells in the principal islet arise from the ductal cells as observed in the tail. These results were added in the manuscript as Figure 2D supplemental.

      Author response image 1.

      Tg (sst1.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of bi-hormonal cells in the principal islet at 6dpf.

      (4) The conclusion of the effect of cyclosporin A on the endocrine progenitors (Line 175) is not convincing because the data cannot distinguish the endocrine progenitors from the insulin-expressing cells. Indeed, Figure 2E shows that neurod1+ cells are fewer than ins+ cells (Figure 2D) in the pancreatic tail at 10 dpt, suggesting that all or at least the majority of neurod1+ cells are already ins+.

      The neurod1+ cells population indeed included both endocrine progenitor cells and differentiated endocrine cells. However, we would like to point out that the timing of the analysis is essential to reach our conclusion. When we treat with CsA, we show an increase of neurod1+ cells already at 4dpt. At this time point, no hormone- producing cell can yet be detected (Figure 2E). Those additional neurod1+ cell are therefore endocrine progenitors and not beta cells. This result shows that CaN inhibition induces pro-endocrine cell formation in regenerative conditions.

      At 10dpt, the neurod1+ cells population includes beta cells as well as endocrine progenitor cell. We agree that the way the data are presented in figure 2D and 2E can be confusing. Those 2 figures come form 2 separated experiments, the number of beta cell in figure 2D can therefore not be compared to the number of Neurod1+ cell in figure 2E. Indeed, from one experiment to another the efficiency and rate of regeneration can vary, independently of calcineurin. To clarify, we added the number of beta cells regenerated in the experiment of figure 2E (see Author response image 2 in red). As you can see in this experiment, regeneration was a bit slower than usual.

      Author response image 2.

      Tg (neurod1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of GFP+ cells (in grey, pink, dark grey and green), and mCherry+ cells for the condition ablated + CsA in red from 2 to 10 dpf.

      (5) Figure 5D shows a significant loss of nkx6.1+ cells in the combined treatment group but there is no direct evidence showing this was a result of differentiation as the authors suggested. This cell loss also outnumbered the increase in ins+ cells (Figure 4D). The cell fates of these lost cells are still undetermined, and the authors did not demonstrate if apoptosis could be a reason of the cell loss.

      Firstly, as you can notice on the graphs, we encountered a very high variability between individuals within the same condition. We decided to show this variability by presenting the raw data. This high variability could partially explain the differences that you underline. Moreover, we would like to point out that independently of CaN inhibition the progenitor loss (nkx6.1+ cell) outnumber the gain of beta cells. Indeed, in average there is a loss of 29% (41 GFP+) of the nkx6.1+ cells and a gain of only 6 beta cells after Notch inhibitory treatment. The other progenitors cells being differentiated into other endocrine cell types (pro-endocrine, alpha, delta). In the combined treatment (Notch and CaN inhibitors), we decreased the number of progenitors cell by 50%, i.e 21% (20 cells) more than without CaN inhibitor. However, we increased the number of regenerated beta cells by two fold (6 cell to 12 cells). In brief, the important progenitors cell loss could be explained by precocious differentiation in the pro-endocrine and endocrine cells type. It is therefore normal than the number of beta cells regenerated do not match the progenitors cell number loss and this in presence or absence of CaN inhibition.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The evidence to indicate the proliferating ductal cell differentiate into beta cell is weak. They should use linkage tracing, or other marker genes immunostaining to confirm that.

      The experiment from the Figure 5 A-D is a short term tracing experiment and should have been presented as such in the manuscript. After LY411575 (Notch inhibitor) and CsA treatments at 3dpf, we exposed the larvae to EdU at 4dpf during 8 hours (Figure 5A). We showed that EdU is incorporated in dividing ductal cells at 4dpf (Figure 5C) ant that 2 days later there are newly form beta cells that are EdU+.(see Author response image 3) To reinforce our conclusion, the image below will be added to the manuscript.

      Author response image 3.

      Tg (nkx6.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with both CsA 1µM and LY411575 5µM. At 4dpf, the larvae were exposed to EdU 4mM during 8 hours, before analysis at 6 dpf.

      (2) To inhibition of CaN and Notch pathway, they just used the pharmacological approaches, genetical approaches should be used to get stronger evidence.

      We employed two distinct inhibitors specifically targeting calcineurin (CsA and FK506) for CaN inhibition. While these inhibitors have distinct chemical structures and potential non-specific effects, they both yield the same result of increased beta cell formation under Notch repression (see Figure 4D and Figure 4B in the supplementary data). This convergence of outcomes strongly suggests that the observed effect is primarily attributable to the specific inhibition of calcineurin.

      Furthermore, we complemented our inhibitor-based approach with a genetic strategy involving CaN overexpression (see Figure 3). Notably, the overactivation of CaN resulted in a reduction of beta cell regeneration. Given that this genetic approach generated an effect contrary to that achieved with the inhibitors, it provides robust support for our model, which postulates that calcineurin plays a critical role in the regulation of beta cell regeneration (see Figure 3, panels C-E).

      As for Notch inhibition, previous published data from our laboratory compared the effects of Notch inhibitor (LY411575) and genetic approaches (mib mutant and transgenic line) on pro-endocrine cell (ascl1b+) and ductal cell (nkx6.1+) formation. This study showed that both Notch inhibitor (LY411575) and Notch repression using genetic approaches recapitulate the same effect: an induction of pro-endocrine cells formation. The specificity of this inhibitor being validated (Ghaye et al., 2015), we did not consider the need of a genetic approach.

      (3) The most enriched pathways among the up-regulated genes were DNA replication and cell cycle, which suggested that these genes are more important for the duct cell proliferation, how is Calcineurin related to these pathways, such as regulating the genes important for proliferation?

      The transcriptomic data presented in this manuscript suggest that the ductal cells undergo a strong proliferative response after beta cell ablation. This is in accordance with our experimental data showing activation of ductal proliferation after beta cell ablation (Ghaye at al., 2015) and data from this manuscript (Figure 1 I-J).

      Calcineurin is a well-known regulator of the cell cycle, and can either promote or repress the cell cycle depending on the cell type. For example, stressing the cell provokes an entry of calcium and subsequently a CaN activation which result in cell cycle arrest (Leech et al. 2020). Nevertheless, depending the cell type, CaN can be either necessary or deleterious to cell proliferation (Goshima et al. 2019; Masaki and Shimada 2022). The intriguing dual role of CaN in cell cycle is well illustrated in β cell regeneration. While CaN should be repressed to enable ductal progenitor amplification and subsequent endocrine differentiation, CaN is then necessary for β cell function and for their replication (Dai et al. 2017; Heit et al. 2006). Moreover, CaN is related to cellular senescence and CaN function is important for proper fin regeneration in zebrafish.

      (4) It is hard to understand why they pick up the pathway of cellular senescence signature for the duct cell progenitor neogenesis? Moreover, among these senescence genes, many genes are cell cycle regulators.

      In response to beta cell ablation, the ductal cells undergo a strong proliferative response, as shown in our previous data (Ghaye 2015). It was therefore not surprising that many differentially expressed genes are cell cycle regulators. On the other hand, the cellular senescence signature was surprising. Indeed, senescence is usually associated with cell cycle arrest and aging. However, recent studies showed that cellular senescence is required for proper development and regeneration. We therefore wanted to investigate this pathway and more particularly the function of calcineurin, which can either promote or repress the cell cycle in different cell types (see comment above).

      (5) The RNA-seq data obtained from adult fish, while the authors use larvae to explore the CaN functions, it may have different conclusion using adult fish. Moreover, it is unclear whether the CaN increased when the beta cell ablated in young larvae.

      We decided to first perform functional experiment in the larvae as this model unable the quantification of beta cell regeneration from the ducts in the pancreatic tail. However, to validate our results in non-developmental stages, we perform experiments in juveniles (2 months old) and adults. CsA treatments in juveniles zebrafish recapitulated the same results that in larvae (Figure 2B and Figure 6A-C). Moreover, we showed that CaN overactivation delayed glycemia recovery after ablation adults (Figure 6D-E), which is in accordance with an impaired regeneration. Altogether, these results strongly suggest that CaN act as regulator of beta cell regeneration both in the juvenile/adult and larval stages.

      Concerning the expression of CaN in the zebrafish larvae, we tried to detect the level of CaN in the different experimental conditions by in situ hybridization. However, we were not able to detect it using this technique. We also tried immunostaining with antiphospho-nfact3 ser165 polyclonal antibody (Invitrogen) but this antibody does not seem to work in zebrafish. Finally, we tried to sort ductal cell at larval stage to perform a transcriptomic analysis but we were unable to collect enough ductal cells to proceed further. Indeed our staining experiment showed that there are only around 150 ductal cells (nkx6.1+, Figure 5D) at this stage.

      (6) The beta cell regeneration in the young larvae usually recovers within ~ 5 days in principle islet. Please also show the beta cell number (PI) during the beta cell recovery after ablation.

      We did show beta cell regeneration in the principal islet in Figure 2A-B supplemental. While new beta cells appears quickly in this islet (Carril, Massoz, Dupont et al., 2023), the principal islet has not yet fully recover at 5dpt.

      (7) Since the studies did not show the CaN level in Fig.3, it is hard to know that the CaN is exactly expressed.

      In the figure 3B, using Tg(hsp70:GFP-CaNCA), it is indeed not possible to see CaN expression at 10 dpt as the heat shocks induce only transiently CaNCA overexpression. However, the transient expression was detected in live shortly after the heat shocks. On the other hand, with the transgenic line Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4), in which GFPCaNCA is continuously expressed allowing us to show CaNCA expression in the pancreatic ducts (Figure 3).

      (8) In Fig.6 D and 6E, did these drug treatments change the glucose level in nonablated fish?

      As you can see below, the CaN inhibitor, CsA does not affect the glycemia of the fish in non-regenerative conditions.

      Author response image 4.

      Glycemia of non-ablated fish, 3 days after drug treatment.

      (9) The logic of writing in Results is very hard to understand.

      We proofed read the paper in an effort to clarify it.

      Minor concerns,

      (1) Make a scheme for ablation and RNA-seq, and indicate the age of the fish used in Fig. 1.

      We added the scheme in Figure 1 supplemental.

      (2) In Fig. 1G, two arrows indicated mCherry+ cells is hard to see in the non-ablated fish.

      One arrow was indeed mislocated, we moved the arrow and try to improve the intensity of red. However, the only cells are indeed small and can be difficult to see.

      (3) In Fig.6, it is hard to know that the arrows indicated islets are small islets (up to 5 cells), how they compared with big islets and defined as small islet. Moreover, some of these islets are almost invisible.

      We now show a close up of a portion of the pancreatic tail and show the beta cells with arrows only in this picture, to enhance clarity.

      Reviewer #2 (Recommendations For The Authors):

      (1) This manuscript needs more proofreading and polishing to increase its readability.

      We proofread the manuscript and change some paragraph for more clarity.

      (2) The extensive use of words like "modulate" or "regulate" sometimes makes the text ambiguous as the effect is not stated directly and clearly.

      We re-wrote some parts of the text and try to avoid using “regulate” as often.

      However, as we used both repression and over-activation of CaN, we still use words as regulate to stipulate general conclusions on the function of CaN.

      (3) The list of individual differentially regulated genes after the beta-cell ablation in the RNAseq seems missing. This list could be interesting and helpful for other researchers. We added it.

      (4) In Figure 1D, "modulated" genes are shown but were they all upregulated like those in Figure 1A? The modulation should be indicated more clearly (e.g. up- or down-regulated) in the figure. The authors can use different colours to illustrate that.

      Done.

      (5) Is Figure 2D showing the same data extracted from Figure 2B? Does Figure 2D add any information to the data?

      No, it does not add data. We actually add the Figure 2D for a better visualisation of the increase at 10dpt.

      (6) In the y-axis of Figure 3E, it should be "mCherry".

      It already is. We did check all the axis again to be sure it is correct.

      (7) Line 219, "Figure 4E supplemental" instead of "Figure 4D supplemental"

      Done.

      (8) Line 266, "ablated juveniles" instead of "ablated larvae"

      Done. Thank you for noticing these mistakes.

      (9) In Figure 6A, many mCherry+ cells are hardly visible and there are some greyish white signals in the images that are supposed to show the mCherry channel only. What are those grey signals?

      There is no channel showing grey on the picture, I improved the overall quality of this pictures and show close up to improve the figure.

      (10) In Figure 6D and 6E, CaNCA overexpression had a significant effect on the glycemia. But did the overexpression affect the beta cell formation or regeneration? We showed that CaNCA overexpression did not affect beta cell formation in absence of regeneration in the larvae (Figure 3E). Moreover, it does not affect the glycemia of the fish in non-regenerative conditions (Author response image 5). As for regenerative conditions, CaN overexpression decreased the regeneration in the larvae (Figure 3E).

      Author response image 5.

      Glycemia of Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4) fish, overexpressing CaNCA, compared to controls fish, in non-regenerative conditions.

      (11) The role of calcineurin seems transient (e.g. Figure 2B and 4E) and does not play a significant role in long term. It would be interesting to see if long-term/repeated treatments of calcineurin inhibitors and overexpression/knockout of important members of calcineurin signaling would affect the pool of progenitors in long term.

      We were also interested in the consequences of CaN overexpression on the long term. Our overexpression tool Tg(UAS:CaNCA) allow to address this question, as CaN is overexpress permanently. We assessed the structure of the ducts and the number of beta cells in transgenic larvae and did not see any defects of the ducts whether in regenerative context or not. On the other hand, we showed in this manuscript that CaN effect is specific to regenerative conditions. As a consequence, it is not likely that repeated treatments long after the ablation would continue to affect beta cell formation and the progenitors pool.

    1. Author Response

      eLife assessment

      We appreciate the assessment carried out by the editorial team at eLife. Therefore, we plan to review the methods section in order to make the statistical analysis more comprehensible for each of the displayed figures.

      Public reviews

      Reviewer 1

      We would like to express our gratitude to Reviewer 1 for providing a thorough summary of our work and highlighting its strengths. With regards to the weaknesses, we are committed to improve the manuscript by performing the necessary changes. First, we will specify the exact p-value in all cases.

      Regarding the discussion section, we acknowledge the feedback regarding its potential confusion. In line with the reviewer's suggestion, we will reduce the literature review and highlight our findings.

      Finally, for the preprint we did not include cofounders such as HIV infection and ethnicity as our study population did not exhibit viral infections and comprised only Hispanic individuals. We will make a more thorough description of the population of study and address these characteristics explicitly in both the methods section and the initial part of the results.

      Reviewer 2

      We appreciate and thank reviewer 2 for the commentaries. Although it is true that several papers have described the role of microbiome in COVID-19 severity, we firmly believe that our current work stands out.

      There is not much information related to this association in mediterranean countries, especially in the south of Spain. In addition, most of the studies only describe microbiota composition in stool or nasopharyngeal samples separately, without investigating any potential relationships between them as we do.

      (1) We agree with the reviewer idea of a limited sample size. We faced the challenge of collecting the samples during the peak of COVID-19 pandemia. Thus, doctors and nurses were overwhelmed and not always available for carrying out patient recruitment following the inclusion criteria. Despite these constraints, we ensured that all included samples met our specified inclusion criteria and were from subjects with confirmed symptomatology.

      In addition, our main goal was to identify whether severity of the disease could be assessed through microbiota composition. Therefore we did not include a healthy group. Despite not having a large N, our results should be reproducible as they are supported by statistical analysis.

      (2) We thank reviewer commentary, and since our original sentence may have lacked clarity, we intend to modify it to ensure it conveys the intended meaning more effectively.

      Nonetheless, we remain confident in the significance of our findings. Not only have we found correlation between microbiota and COVID severity, but we have also described how specific bacteria from each condition is associated with key biochemical parameters of clinical COVID infection.

      (3) We appreciate the feedback provided by the reviewer. In this case, we have performed 16S analysis due to its cost-effectiveness compared to metagenomic approaches. Furthermore, 16S analysis has undergone refinements that ensure comprehensive coverage and depth, along with standardized analysis protocols. Unlike 16S, metagenomic approaches lack software tools such as QIIME that facilitate standardization of analysis and, thus, reduce reproducibility of results.

      (4) We sincerely appreciate this insightful suggestion. simply listing associations between both microbiomes and COVID-19 severity could not be enough, we intend to discuss how microbiota composition may be linked to the mechanisms underlying COVID-19 pathogenesis in our discussion.

      (5) We are grateful for the constructive criticism and intend to rewrite our abstract to enhance clarity. Additionally, we will thoroughly review all figures and their descriptions to ensure accuracy and comprehensibility.

      Reviewer 3

      We acknowledge the annotations made by reviewer 3 and are committed to addressing all identified weaknesses to enhance the quality of our work. Our idea is to modify the methods section and figures to make them easier to understand.

      Specifically, in the case of Figure 1, we recognize an error in the description of the Bray-Curtis test. We appreciate the commentary and we will make the necessary changes. Moreover, there is another observation related to Figure 1 description. We are going to modify it in order to gain accuracy.

      For figure 2 we are planning to add a supplementary table showing the abundance of detected genus. Nevermind, we will also update the manuscript text to provide clarification on how we obtained this result.Regarding the clarification about "1% abundance," we want to emphasize that we are referring to relative abundance, where 1 represents 100%. To avoid confusion, we will explicitly state this in both the methods section and figure descriptions. Besides, it is true that the statistical test employed for the analysis is not mentioned in the figure description and we recognize that the image may be difficult to interpret. Therefore, we will modify the text and a supplementary table displaying the abundance and p values is going to be added.

      Furthermore, we agree with the reviewer's suggestion to investigate whether the bacteria identified as potential biomarkers for each condition are specific to their respective severity index or if there is a threshold. Thus, we will reanalyze the data and include a supplementary table with the abundance of each biomarker for each condition. We will also place greater emphasis on these results in our discussion.

      Finally, in response to the reviewer's suggestion, we are going to go through the nasopharyngeal-fecal axis part in the discussion. It is well described that COVID-19 induces a dysbiosis in both microbiomes.

      Consequently, we understand that the ratio we have described could be an interesting tool for assessing COVID severity development as it considers alterations in both environments. However, we acknowledge that there may be room for improvement in clarifying the significance of this intriguing finding and its implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This comprehensive study provides valuable information on the cooperation of Ikaros with Foxp3 to establish and regulate a major portion of the epigenome and transcriptome of T-regulatory cells. However, the characterization is incomplete in that incontrovertible evidence that these are intrinsic features regulating biological function and not outcomes of the inflammatory micro-environment of the genetically manipulated mice is missing.

      Public Reviews:

      This study investigates the role of Ikaros, a zinc finger family transcription factor related to Helios and Eos, in T-regulatory (Treg) cell functionality in mice. Through genome-wide association studies and chromatin accessibility studies, the authors find that Ikaros shares similar binding sites to Foxp3. Ikaros cooperates with Foxp3 to establish a major portion of the Treg epigenome and transcriptome. Ikaros-deficient Treg exhibits Th1-like gene expression with abnormal expression of IL-2, IFNg, TNFa, and factors involved in Wnt and Notch signaling. Further, two models of inflammatory/ autoimmune diseases - Inflammatory Bowel Disease (IBD) and organ transplantation - are employed to examine the functional role of Ikaros in Treg-mediated immune suppression. The authors provide a detailed analysis of the epigenome and transcriptome of Ikaros-deficient Treg cells.

      These studies establish Ikaros as a factor required in Treg for tolerance and the control of inflammatory immune responses. The data are of high quality. Overall, the study is well organized, and reports new data consolidating mechanistic aspects of Foxp3 mediated gene expression program in Treg cells.

      Strengths:

      The authors have performed biochemical studies focusing on mechanistic aspects of molecular functions of the Foxp3-mediated gene expression program and complemented these with functional experiments using two models of autoimmune diseases, thereby strengthening the study. The studies are comprehensive at both the cellular and molecular levels. The manuscript is well organized and presents a plethora of data regarding the transcriptomic landscape of these cells.

      Response: We thank the reviewers for their careful review and feedback on our manuscript. We appreciate that the reviewers and editors recognize the strength and comprehensive nature of our in vivo, cellular, biochemical, and genome-wide molecular studies, which are well-organized in the manuscript. The acknowledgment of the complementary functional experiments in two models of inflammatory disease is also encouraging.

      Weakness:

      The authors claim that the mice have no pathologic signs of autoimmune disease even at a relatively old age, yet mice have an increased number of activated CD4+ T cells and T-follicular helper cells (even at the age of 6 weeks) as well as reduced naïve T-cells. Thus, immune homeostasis is perturbed in these mice even at a young age and the eXect of inflammatory microenvironments on cellular functions cannot be ruled out. Further, clear conclusions from the genome-wide studies are lacking.

      Response: We agree with the reviewers' comment regarding the absence of overt autoimmune pathologies in Ikzf1-fl/fl-Foxp3-Cre+ mice, despite the increased frequency of activated CD4+ T cells, TFH cells, and apparent perturbation of lymphocyte homeostasis, even at a young age. It is noteworthy that while Ikaros is implicated in various autoimmune diseases, our specific mouse model in which Ikaros expression is lost only in Tregs, may not lead to a strong autoimmune phenotype in part due to the controlled environment of an extra-clean, pathogen-free animal facility. This aligns with a related study by Ana et al (2019, J. Immunol: doi:10.4049/jimmunol.1801270) in Ikzf1-fl/fl-dLck-Cre+ mice with loss of Ikaros expression in all mature CD4+ T cells, including Tregs, that exhibit no overt signs of overt autoimmune disease. Moreover, our transcriptomic studies reveal that increased expression of inflammatory genes in Ikzf1-deficient Treg is coupled with the simultaneous upregulation of genes with positive roles in Treg function. This balance suggests a compensatory mechanism within Ikaros-deficient Tregs that maintains their suppressive function until encountering an inflammatory immune challenge, which eventually leads to loss of Treg suppressive function in Treg-specific Ikaros-deficient mice. Our studies clearly show that Ikaros has cell-intrinsic eXects in Treg that also lead to cell-extrinsic eXects mediated by secreted factors that are likewise regulated by Ikaros. This can be said about the function of any transcription factor in any cell type. Our data clearly support the conclusion from the genome-wide studies that Ikaros plays a major role in establishing the active chromatin landscape, gene expression profile, and function of regulatory T cells in mice.

      The following recommendations consolidate the views of the three reviewers of the manuscript.

      The experiments suggested and, in some instances, fresh analysis, are thought necessary, so that the evidence of Ikaros-Foxp3 interactions regulating T-regulatory cell biology is comprehensive and solid. We hope the comments are useful to strengthen the comprehensive analysis reported in this submission.

      The primary concern is that the indications of inflammation in the mice (see points 1 & 2 below) do not reflect in the experiments or consequent conclusions. The gap in the data should be addressed by testing these interactions in an appropriate context for which suggestions are included.

      Please note that the title of the manuscript may be modified to reflect the use of mice as the system of study for this work.

      (1) The evidence of inflammation (increased CD4 and T follicular cells) reported in the work requires new experiments to rigorously examine the relationship between Ikaros and Foxp3 to rule out the possible impact of the (inflammatory) microenvironment of the mice (Please see: Zemmour et al., Nat. Immunology 22, 607, 2021). Two possible experimental systems in mice are suggested.

      a) The use of heterozygous female mice, which should be phenotypically normal due to the presence of 50% normal Treg. Or,

      b) The generation of bone chimeras between wild-type and deficient mice using congenic markers.

      Response: We agree that immune dysregulation that develops in the mice with age or during an inflammatory insult due to loss of Ikaros function in the Treg lineage is an important part of the phenotype of the animals. Our studies show that loss of Ikaros function in Treg influences the gene expression program such that Treg now produce inflammatory cytokines and ligands capable of engaging receptors expressed on Treg and other cells. This likely results in autocrine and paracrine signaling that induces further metabolic and gene expression diXerences not observed in wild-type mice. Indeed, we report in the manuscript that a sizable fraction of the diXerentially expressed genes do not appear to be direct Ikaros targets, but rather are downstream of Ikaros target genes such as Il2, Ifng, Notch, and Wnt. The mosaic experiments suggested will be a useful topic of future studies. Importantly, we argue that no gene expression study involving modulation of transcription factor activity in an organism- or cell-based system can be designed to measure only the direct eXects of that transcription factor in a manner isolated from any indirect, downstream eXects on the expression of other genes. We suggest that our current data remain highly valuable, as they reveal real and relevant biology in physiologic in vivo systems that do not depend upon the use of heterologous models. The fact that loss of Ikaros has an eXect not only on its direct targets, but on gene programs driven in turn by the indirect eXects of Ikaros-regulated factors, has been acknowledged in the manuscript.

      (2) Figs. 7 and S5 show accumulation of CD4 cells (activated, memory, Tfh, Tfr) in LNs and spleens of the Ikaros KO over time. This is accompanied by elevated Igs but without overt autoimmune disease. KO Tregs had equivalent suppressive activity as WT Tregs against WT TeX in vitro. However, TeX from KO mice were resistant to the suppressive eXects of WT or KO Tregs. The authors interpret this as due to the increased percentage of memory cells within the KO TeXs, although they did not formally prove this point. Figs. 9 and S6 show that Ikaros KO mice are unable to be tolerized for cardiac allograft survival using two diXerent standard tolerogenic regiments. The rejecting allografts are accompanied by increased T-cell infiltration and upregulation of inflammatory genes. The authors suggest there is increased alloantibody, but alloantibody does not seem to have been measured.

      Response: We are currently exploring in more detail the dysregulation of humoral immunity in the Ikzf1-deficient Treg model and plan to report these results in a future study.

      (3) Linked to the above, a comparison of the chromatin occupancy of Ikaros in resting and activated Tregs would inform on whether and how Ikaros occupancy changes with the activation status of Tregs. Since the authors use in vitro stimulation for RNAseq and ATAC seq, ChIP seq analyses under these matching conditions will greatly add to the quality of the study. Since "Foxp3-dependent", ie. diXerential gene expression in the Foxp3GFPKO cells (PMID: 17220874) gene expression has been shown to be not entirely the same as Treg signature (i.e. gene expression or Tregs compared to Tnv), it will be worth correlating Ikaros, Foxp3 co-occupied genes and the corresponding fate of their expression with Foxp3-dependent and independent Treg signature gene sets.

      Response: The prior study by Gavin et al. referred to above used duplicate samples instead of the standard three or more replicates required for a robust diXerential analysis of gene expression. The two samples in this study are variable, and no statistically significant diXerential gene expression was found between the experimental groups when we subjected these data to current analysis methods. For this reason, we have elected not to compare these prior data with our current data, which are robust, reproducible, and analyzed using current statistical methods. Furthermore, the mice used for the prior study develop a fatal inflammatory disease (scurfy) and therefore the Treg examined in this study would be subject to a much stronger extrinsic inflammatory environment than the Treg in our study, as our mice show no overt disease even with age.

      Further, the consequence of the cooperation between the two transcription factors that can be inferred from the experiments in the study remains unclear. It is suggested that the authors could first consider the ChIP seq data from Foxp3, Ikaros co- and diXerentially occupied genes, and then correlate with the ATAC seq and gene expression data to comment on the consequence of this cooperation.

      Response: We find that Ikaros binding at a given region has a strong eXect on accessibility, as reported in the manuscript, but that Foxp3 occupancy has less consequence, consistent with a prior study suggesting that Foxp3 largely utilizes the open chromatin landscape already present in the conventional CD4 T cell lineage (PMID:23021222). Our data suggest that the dominant eXect of Ikaros on Foxp3 is at the level of chromatin occupancy.

      (4) In the comparative analyses of Ikaros and Foxp3 co-occupied regions and gene expression outcome, the authors mention "A total of 4423 Foxp3 binding sites were detected in the open chromatin landscape of wild-type Treg (Supplementary Table 9), and this ChIP-seq signal was enriched at accessible Foxp3 motifs." It is unclear whether the authors focused on the ATAC seq data and only examined the open chromatin regions for this analysis. In that case, it is unclear why. More so because the Ikaros footprint is more apparent in regions where accessibility is reduced upon deletion of Ikaros.

      Response: Foxp3 has been shown to bind primarily at open chromatin shared between Tconv and Treg, unlike the pioneer activity of other Fox family members (PMID: 23021222, biorXiv https://www.biorxiv.org/content/10.1101/2023.10.06.561228v2.full.pdf). Consistent with this, we found the majority of peaks were in open chromatin. The motif analysis is quantitative, not binary, and takes into account Foxp3 binding sites at regions considered open in either condition, which is why we can see enrichment of Foxp3 motifs at sites going from more open to less open in the absence of Ikaros.

      (5) Comments on figures:

      The authors use MFI repeatedly in many of the figures for quantitation of antigen expression. This is misleading as several of the target antigens are normally expressed on a subpopulation of cells, e.g., Eos. Percent positive and MFI would be more relevant. Cytokine production should be presented by intracellular staining (e.g., IL-2, IFNg) as Elisa data does not allow one to determine the percentage of abnormally producing cells.

      Response: We show both ICS and ELISA in this paper, preferring ELISA because it is much more quantitative than ICS.

      Suppl. Fig. 1c - the panels do not correspond precisely to the legend or the text. At least one panel is missing. In Supp fig 1c, the authors plotted eXector Tregs, which are by definition CD62LloCD44hi, but the Y axis says CD44hiCD62Lhi. Is this a typo? Also on page 4, describing this data the authors mentioned Tfr, but the data is not shown in the Supp fig 1c.

      Response: We thank the reviewer for catching these mistakes. We have corrected the typo in the figure panel for Supplementary Figure 1c. Follicular Treg data are indeed presented in Figure 7h, not Supplementary Figure 1, and we have corrected the text.

      Fig. 2, which lists the diXerent categories of diXerentially expressed genes, it will be helpful if the authors add two columns indicating fold change and FDR values.

      Response: These values are included in Table S1

      Fig. 3c, the resolution of the histograms in the inset should be enhanced.

      Fig. 3d, a histogram of representative CTV dilution plots, and an explanation of how the quantifications were done may be included.

      Fig. 3e - not well labeled. Are these fold changes? Enrichments? Number of gene elements within the GO term that are aXected? Something else?

      Fig. 3f - presented out of sequence. The data are a little hard to understand as the color scale is so subtle and the colors so close to one another that it is not entirely clear which gene expressions are increased vs decreased. Other than the simple statement that the Ikaros KO causes numerous changes, there does not seem to be a more consistent message from this data panel.

      Fig. 4a, in addition to the bar graphs, it will be better to show the plots in a histogram, gated on Foxp3+ Tregs in WT and KO groups, with representative MFI indicated on top. The resolution of the scatter plots in this figure, as well as some others throughout the manuscript, may be improved. Please increase the resolution wherever necessary.

      Fig. 4b should include representative plots for cytokine production gated in Tconv (CD4+Foxp3-) cells.

      Figs. 5a-h, S2-3a-d, and Suppl. Tables S4-8 show a comprehensive ATAC-seq and ChIP-seq analysis of genes and chromatin occupied or regulated by Ikaros, comparing Tconv vs Treg, stimulated vs naïve, and WT vs KO cells. It is a comprehensive tour-de-force analysis, again showing the major eXects of Ikaros on the entire Treg landscape of gene regulation.

      Fig. S5h-j should be explained or labeled in more detail. The fonts are too small to read, even at 200% magnification; and the cell and gene comparisons are not entirely clear.

      Supp. Fig. S3e is not referred to in the text.

      Fig. S4a is very diXicult to read; the font and plotted points are too small.

      Response: We have improved the clarity of the figures where necessary. We also indicate in the figure legends that full gene lists are to be found in the supplementary tables.

      Page 8, "Regions that exhibit reduced accessibility in Ikzf1 cko compared to wild-type Treg are enriched for the binding motif for Ikaros and the motif for TCF1 (Figure 5g).... ". Is this Fig. 5i or 5g?

      Response: This statement is correct and is referring to data depicted in Figure 5g.

      In Fig 6e, Flag-Ik7 is not visible in any of the inputs. The co-IP between Foxp3 and Runx1 (presumably a positive control) is not eXicient in this experimental condition. Co-IP experiments performed in primary cells upon retroviral transduction of the tagged proteins to confirm observations in cell lines are suggested.

      Response: Runx1 is shown to co-precipitate with Foxp3 as expected, although the band is not intense, and the data depicted are representative of 3 experiments. Ik7 was included in this transient transfection experiment as a redundant control, and the referee is correct that Ik7 did not express well in this experiment and cannot be seen in this exposure. We showed these blots intact in the spirit of not digitally altering the data, and because the low Ik7 expression did not impact our ability to demonstrate specific co-precipitation of Foxp3 with full length Ikaros (Ik1). The images include nearly the entire mini-blots, and we have added molecular weight markers for clarity. As indicated in the legend, the cytokine and ChIP data in 6f are from a separate model of retrovirally Foxp3/Ik7transduced T cells that we and others have used in multiple prior studies (e.g. Thomas JI 2007, Thomas JI 2010). The interpretability of these experiments is not impacted by the transient transfection data from figure 6e. It should be noted that a prior study by Rudra et al. that is cited and referred to in the manuscript used a similar approach to also establish that Foxp3 and Ikaros form a complex in cells.

      In Fig 6f, the authors state that Foxp3 overexpression in CD4 cells results in promoter occupancy of both IL2 and IFNg, however, data shows only IL2. Also in 6f, Foxp3 overexpression reduces IL2 and IFNg secretion, measured by ELISA, which is recovered by IkDN. However, the eXect of Foxp3 along with WT Ikaros (which should not modulate, and if anything, further repress IL2, IFNg production) is not shown.

      Response: The reviewer is correct that ectopic expression of Ikaros leads to repression of cytokine gene expression, which we and others have shown in prior studies. Because the focus of this study was on loss of Ikaros function in Treg, we did not elect to overexpress full-length Ikaros. However, we completely agree that Ikaros GOF in Treg is an important topic for future studies.

      Fig. 7e-g, how is %suppression calculated? Can representative CTV dilution plots for the suppression assays be shown?

      Response: Cell division was quantified as described previously (see ref 50), and percent suppression represents the reduction in cell division measured by Tconv in the presence of Treg compared to in the absence of Treg. This has been clarified in the methods section.

      In Fig 8 and the supplementary figures the representative colon pictures (Fig. S6a-c) do not show convincing diXerences in colon morphology even though all the other histology and clinical parameters are clear. Are the figures mislabeled?

      In Fig 8c-e and other histology figures scale bars should be shown.

      Fig. 8c-e, the Alcian blue staining among the groups appears similar; perhaps this is due to the low power magnification.

      Response: We have edited this figure for clarity

      Additional comments:

      Fig 10 is explained in the discussion section for the first time. The authors may want to consider including this when introducing Ikzf1 ChIPseq data for the first time in the study.

      Response: The reviewer raises a valid point but we have elected to retain the current organizational structure of the manuscript.

      A more complete characterization of the activated conventional cells including both CD4+ and CD8+ T cells for cytokine production during aging may be considered, as it is highly likely that abnormalities in cytokine production will be observed.

      Response: We agree and are planning additional such experiments in future studies focusing on in vivo models of tolerance.

      The failure of suppression of T cell proliferation which the authors claim is due to the presence of activated memory T cells can be better documented by using naive responder cells from the cKO mice.

      Response: We agree and are planning additional such experiments in a future study focusing on further aspects of cellular immunobiology impacted by Ikaros, but we will give preference to in vivo models of tolerance in such studies.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (recommendations for the authors):

      Additional suggestions for improvement are noted below:

      (1) Additional 1. Lns 261-262, as well as abstract: The term 'aerobic fermentation' is not accurate in the context of this manuscript. This terminology should be reserved for conditions where lactate production is observed under optimal aerobic conditions. This is not the case in this study. More lactate was observed in the agr mutant only when cells were grown under microaerobic conditions, where some level of fermentation would be expected to be active (esp. if nitrate is not provided in media).

      We modified the text by deleting reference to the “aerobic” fermentation as suggested by the reviewer:

      Line 93 (abstract): “Deletion of agr increased both respiration and aerobic fermentation but decreased ATP levels and growth, suggesting that Δagr cells assume a hyperactive metabolic state in response to reduced metabolic efficiency.”

      Line 184: “Collectively, these data suggest that Δagr increases respiration and aerobic fermentation to compensate for low metabolic efficiency.”

      (2) Additionally, the authors' statement, 'The tendency of Δagr cells to forgo the additional ATP yield from acetate production in favor of NAD+-generating lactate (23, 24) underscores the importance of redox balance in Δagr cells,' appears contradictory to the data presented in Fig 5, where the Δagr mutant demonstrates an approximately threefold increase in acetate production during exponential growth compared to the wild-type strain. A clarification or adjustment in the manuscript may be necessary to ensure consistency and accurate interpretation.

      In glucose-fermenting S. aureus, pyruvate can serve as an electron acceptor, generating lactate from lactate dehydrogenases. Acetyl-CoA production proceeds via the pyruvate formate-lyase reaction, which converts pyruvate to formate rather than CO2 and thus does not consume oxidized NAD+. Thus, at a general level, the tendency of fermenting cells to forgo the additional ATP yield from acetate production in favor of NAD+-generating ethanol synthesis underscores the importance of redox balance when respiration is suboptimal. This is especially true for fermenting Δagr strains, as evidenced by increased lactate production compared to their relatively ATP replete wild-type parental strains. However, in the interest of clarity, we removed the sentence in question, because it is not necessary and potentially confusing, and because the additional context it requires would detract from the manuscript by disrupting its sense of narrative and brevity.

      (3) Ln 277-285: There still are errors in how this paragraph is worded. What the authors stated in the 'response to the reviewers' (question 13) and the changes they made in the text are different. Here again, the response to question 13 suggested the following, "Collectively, these observations suggest that a surge in NADH production and reductive stress in the Δagr strain induces a burst in respiration, but levels of NADH are saturating, thereby driving fermentation in the presence of oxygen." That bit of it where the authors suggest that fermentation was activated because NADH was saturating is only true under microaerobic conditions and not under oxygen rich conditions.

      Reviewer #1 (comment under Review): Data presented in Figure 5 suggest the opposite - a surge in NADH accumulation leading to a decrease in the NAD/NADH ratio, rather than a surge in the 'consumption' of NADH. Clarifying this point in the manuscript would ensure accurate representation of the findings.

      Responses to Comments 3 and a comment in the Review have been combined.

      Line 280: We thank the Reviewer for their attention to detail in picking up our error in response to question 13 related to the difference in the revised text and “response to reviewers”. We modified the text accordingly.

      “Microaerobic conditions and “consumption”: We have modified the wording and fixed the error with respect to “consumption” as pointed out by the reviewer (strikethrough/underlined):

      Line 285: “Collectively, these observations suggest that a surge in NADH consumption accumulation and reductive stress in the Δagr strain induces a burst in respiration, but levels of NADH are saturating, thereby driving fermentation under microaerobic conditions in the presence of oxygen.”

      Reviewer #2 (recommendations for the authors):

      (1) The authors are requested to revise 'we expected a lower NAD+/NADH' in line 280 to 'we expected a higher NAD+/NADH.' Additionally, what was the glucose concentration in TSB media?

      NAD+/NADH: We thank the Reviewer for their attention to detail in picking up our error. Our responses to Reviewer 1, Comment 3 above addresses this issue.

      Glucose: We modified the Methods as suggested.

    1. Author Response

      eLife assessment

      This study demonstrates mRNA-specific regulation of translation by subunits of the eukaryotic initiation factor complex 3 (eIF3) using convincing methods, data, and analyses. The investigations have generated important information that will be of interest to biologists studying translation regulation. However, the physiological significance of the gene expression changes that were observed is not clear.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e, and f) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that the global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control the translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      We would like to note that the first sentence contains a typo; the correct expression is: “…of three subunits of the eIF3 complex (d, e, and h) in mammalian cells”.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      Many experiments are on-going in this direction. The original plan was to map all the effects in general and in as much detail as possible to select a few of them for future long-term projects.

      Reviewer #2 (Public Review):

      Summary:

      mRNA translation regulation permits cells to rapidly adapt to diverse stimuli by fine-tuning gene expression. Specifically, the 13-subunit eukaryotic initiation factor 3 (eIF3) complex is critical for translation initiation as it aids in 48S PIC assembly to allow for ribosome scanning. In addition, eIF3 has been shown to drive transcript-specific translation by binding mRNA 5' cap structures through the eIF3d subunit. Dysregulation of eIF3 has been implicated in oncogenesis, however the precise eIF3 subunit contributions are unclear. Here, Herrmannová et al. aim to investigate how eIF3 subcomplexes, generated by knockdown (KD) of either eIF3e, eIF3d, or eIF3h, affect the global translatome. Using Ribo-seq and RNA-seq, the authors identified a large number of genes that exhibit altered translation efficiency upon eIF3d/e KD, while translation defects upon eIF3h KD were mild. eIF3d/e KD share multiple dysregulated transcripts, perhaps due to both subcomplexes lacking eIF3d. Both eIF3d/e KD increase the translation efficiency (TE) of transcripts encoding lysosomal, ER, and ribosomal proteins. This suggests a role of eIF3 in ribosome biogenesis and protein quality control. Many transcripts encoding ribosomal proteins harbor a TOP motif, and eIF3d KD and eIF3e KD cells exhibit a striking induction of these TOP-modified transcripts. On the other hand, eIF3d KD and eIF3e KD lead to a reduction of MAPK/ERK pathway proteins. Despite this downregulation, eIF3d KD and eIF3e KD activate MAPK/ERK signaling as ERK1/2 and c-Jun phosphorylation were induced. Finally, in all three knockdowns, MDM2 and ATF4 protein levels are reduced. This is notable because MDM2 and ATF4 both contain short uORFs upstream of the start codon, and further support a role of eIF3 in reinitiation. Altogether, Herrmannová et al. have gained key insights into precise eIF3-mediated translational control as it relates to key signaling pathways implicated in cancer.

      Strengths:

      The authors have provided a comprehensive set of data to analyze RNA and ribosome footprinting upon perturbation of eIF3d, eIF3e, and eIF3h. As described above in the summary, these data present many interesting starting points for understanding additional roles of the eIF3 complex and specific subunits in translational control.

      Weaknesses:

      • The differences between eIF3e and eIF3d knockdown are difficult to reconcile, especially since eIF3e knockdown leads to a reduction in eIF3d levels.

      We agree and discuss this problem thoroughly in the corresponding section of our study.

      • The paper would be strengthened by experiments directly testing what RNA determinants allow for transcript-specific translation regulation by the eIF3 complex. This would allow the paper to be less descriptive.

      We carried out bioinformatic analysis dealing with specific RNA determinants that is presented as the last chapter of our study. A detailed, transcript-specific analysis of these determinants is underway, however, we consider them beyond the scope for this article.

      • The paper would have more biological relevance if eIF3 subunits were perturbed to mimic naturally occurring situations where eIF3 is dysregulated. For example, eIF3e is aberrantly upregulated in certain cancers, and therefore an overexpression and profiling experiment would have been more relevant than a knockdown experiment.

      This is indeed true and so far we have generated several stable cell lines individually overexpressing selected eIF3 subunits implicated in the observed cancer phenotypes. However, this is a completely different project of one of our PhD students, which will be published as a comprehensive study when completed.

      Reviewer #3 (Public Review):

      Summary:

      In this article, Hermannova et al catalog the changes in ribosome association with mRNAs when the eukaryotic translation initiation factor 3 is disrupted by knocking down subunits of the multisubunit protein. They find that RNAs relying on TOP motifs for translation, such as ribosomal protein RNAs, and RNAs encoding proteins that modify other proteins in the ER or components of the lysosome are upregulated. In contrast, proteins encoding components of MAP kinase cascades are downregulated when subunits of eIF3 are knocked down.

      Strengths:

      The authors use ribosome profiling of well-characterized mutants lacking subunits of eIF3 and assess the changes in translation that take place. They supplement the ribosome association studies with western blotting to determine protein level changes of affected transcripts. They analyze what is being encoded by the transcripts undergoing translation changes, which is important for understanding more broadly how translation initiation factor levels affect cancer cell translatomes.

      Weaknesses:

      (1) The data are presented as a catalog of effects, and the paper would be strengthened if there were a clear model tying the various effects together or linking individual subunit knockdown to cancerous phenotypes. It is unclear what the hypothesis is for cells having more MAPK activity with less of the MAPK proteins being translated, so the main findings of the paper become observational without context.

      As the signaling pathways are very complex and there is a frequent crosstalk among them (c-Jun can be activated by the ERK pathway as well as the JNK pathway, activated ERKs can phosphorylate many different transcription factors, etc.), we opted not to investigate the reported results any further in this study. As mentioned above, we have several ongoing, long-term projects aiming to elucidate the consequences of the observed changes in protein levels as well as in the phosphorylation status of the MAPK pathway constituents. The take home message of the present study is that eIF3 subunits (d and e) have control over the expression of many proteins involved in the MAPK/ERK pathway and that there is an independent effect (already present in the downregulation of eIF3h, which does not affect the MAPK protein expression) that leads to activation of the ERK pathway, which may be a direct consequence of compromised eIF3 function in general.

      (2) The conclusions drawn are presented as very generalized other than in the last paragraph, but the experiments were only done in Hela cells. Since conclusions are being made about how translation changes affect MAP kinase signaling and there is mention in the abstract that dysregulation of these subunits is observed in cancer, at least one other cell line would need to be analyzed to provide evidence that the effects of subunit knockdown aren't cell-line specific.

      There are several notes emphasizing that the data presented in this study were obtained only in HeLa cells. We agree that further research in other cell lines will be needed to confirm that what we observed is a general phenomenon. Nonetheless, as noted in the discussion, other reports have already been published strongly indicating that this phenomenon is not unique to HeLa cells (Li et al., 2021, PMID:34520790, HTR-8/SVneo cells). We will review our conclusions and further clarify that our results so far only apply to Hela cells.

      (3) It is also unclear how replicates were performed and how many replicates were performed for several experiments. Biological replicates are mentioned, but what the authors did for biological replicates isn't defined and the description of the collection of cells for polysome/ribosome footprint/RNA seq samples makes it unclear whether the "biological replicates" are samples from separate transfections (true biological replicates) or different aliquots or wells from a single transfection (technical replicates) being run over a separate gradient. If using technical replicates, the data comparing the effects of knocking down D vs E vs H subunits are substantially weakened because subunit-specific differences could be the result of non-specific events that occurred in a transfection. It's also notable that while the pooled siRNAs will increase the potency of knockdown, it is possible that one or more of the siRNAs could have off-target effects, and analyzing individual siRNAs would be better for ensuring effects are specific.

      We can reassure this reviewer that our Ribo-seq and RNA-Seq libraries were prepared from true biological replicates, grown, and transfected at different times. In fact, for each biological replicate, we used a new aliquot of cells from cryostock from the same batch and transfected the cells with the same passage number only. Multiple biological replicates were grown and all underwent a series of control experiments (polysomes, qPCR, western blot) as described in the article. Based on the results, 3 samples were selected for Ribo-Seq library preparation and 4 for RNA-Seq. We decided to add a fourth replicate for RNA-Seq to increase the data robustness, because RNA-Seq is used to normalize FPs to calculate TE, which was our main metric analyzed in this article.

      As for the usage of the siRNA pool from Dharmacon/Horizon – our current article builds on our previous studies (Wagner et al. 2014 PMID: 24912683; Wagner et al. 2016 PMID: 27924037 and Herrmannová et al. 2020 PMID: 31863585), where we thoroughly characterized the effects of downregulation of individual eIF3 subunits on the growth, translation, composition and stability of eIF3 complex and on the 43S preinitiation complex assembly and subsequent mRNA recruitment. In all of these studies, we used the same siRNAs pools, the same cells and the same transfection protocol; therefore, we are convinced that our results are as coherent and reproducible as can possibly be. We have never noticed any off-target effects. Moreover, the ON-TARGETplus siRNA technology we employed uses a patented modification pattern that reduces the incidence of off-targets by up to 90% compared to unmodified siRNA (see the supplier's website for more information).

      (4) Many of the changes in protein levels reported by Western are subtle. Data from all western blots making claims of quantitative differences should really be quantified relative to nontreated over-loading control or total protein quantified from the gel, and presented with a degree of error from biological replicates to make conclusions about differences in protein levels between samples.

      Generally speaking, we agree with the reviewer’s opinion. In the original version of our study, we felt that it was not necessary to perform a quantification analysis to support our conclusions as it was not important whether a given protein was downregulated to, for example, 60% or 70%, as long as its amount was visibly reduced. The main message resided in the general trend, i.e. that the whole pathway is affected in a similar way. Nevertheless, in order to properly address this criticism, we will provide quantifications in the revised paper.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your comments. We will further investigate the correlation between aging and the proteins eIF2β and eIF2α and include the results in the revised version.

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      We agree with the reviewer that multiple ways exist to interpret the LC3 blotting for the autophagy assessment. Thus, we analyzed the levels of p62, an autophagy substrate, and found that milton knockdown caused elevated levels of p62 (Figure 2B). Together, these results suggest that autophagic degradation is lowered.

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for your comments. We will include more analyses of the proteomic data in the next version of our manuscript. In this study, we aimed to elucidate the mechanisms by which depletion of axonal mitochondria induces proteostasis disruption prematurely. Thus, we did not investigate the roles of differentially expressed proteins in proteostasis at 21-day-old in milton knockdown. Aging disrupts proteostasis via multiple pathways: eIF2β levels may be lowered by feedback of earlier changes or via interaction with other age-related changes at 21-day-old. We will include more discussion in the next version of our manuscript.

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation. Thank you for your comment. We understand that the phosphorylation of eIF2α is inhibitory for translation initiation, as we described in page 9, Line 312-314. We propose a model in which autophagic defects caused by milton knockdown is mediate by upregulation of eIF2β, however, we are not arguing that the translational suppression in milton knockdown is caused by a reduction in p-eIF2α. We found that milton knockdown causes an increase in eIF2β, and overexpression of eIF2β copied phenotypes of milton knockdown such as autophagic defects (Figure 5 and 6). We also found that the increase in eIF2β reduces the level of p-eIF2α (Supplemental Figure 2), thus, eIF2α phosphorylation in milton knockdown may be caused by an increase in eIF2β. However, the effects of upregulation of eIF2β on the function of eIF2 complex is not fully understood. The translational suppression in milton knockdown may be caused by disruption of eIF2 complex, while it is also possible that it is mediated by a function of eIF2β that is yet-to-be-determined, or mediated by the pathways other than eIF2. We will include more details in the revised version.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%. Thank you for pointing it out. We are sorry, it was a mistake. The gradient was actually 10-50%, and we described it wrong. We will correct it in the revised version.

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript. Thank you for pointing it out. We will replace the graph.

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. A new set of experiments with technical challenges will be required to capture the spatial resolution for the axonal translation. We will work on it and hope to achieve it in the future.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for their comments. However, my request for additional experiments to consolidate this manuscript and text changes have not been addressed (point 1 and point 2), which I believe are essential for completion of this manuscript.

      The reviewer raised the question about the relevant substrates of PARG in S-phase cells (point 1). As we explained in our previous response, the most important substrate of PARG is PARP1, since we observed increased chromatin-associated PARP1 and PARylated PARP1 in cells with PARG depletion. Moreover, PARP1 or PARP1/2 depletion rescued cell lethality caused by PARG depletion. These data strongly suggest that PARP1 is the major substrate of PARG in S phase cells. Of course, PARG may have additional substrates. In the future, we will perform proteomics experiments as suggested by this reviewer to identify additional PARG substrates, which may reveal new roles of PARG in S phase progression.

      The reviewer also suggested us to re-organize our manuscript (point 2). However, we prefer to keep the manuscript as it is, since this is how the project evolved. The other reason we would like to share with the readers is the challenge to validate KO cells. This is an important lesson we learned from this study. We hope that this will raise the awareness of hypomorphic mutant cells we often use to draw conclusions about gene functions and/or genetic interactions. We understand that the current flow of our manuscript may bring some confusion. To avoid it, we included additional explanations at the beginning of this manuscript to draw attention to the readers that our initial KO cells may not be complete PARG KO cells, i.e. they may have residual PARG activity. We also included additional discussion of this important point in the Discussion section.

      Moreover, WB analysis of PARG KO clones is inconclusive, as the additional prominent band at 50 kDa could be a degradation product. The authors should check PARG levels are localization by IF, which allows detection of intact proteins and their cellular localizations, since the shorter isoform should be localized in the cytosol. WB with PARG isoforms is missing important information regarding Mw of the PARG constructs and Mw labels of western blots, which makes is difficult to evaluate this data and compare to KO. Ideally, KO and PARG isoform samples should be all on one gel for proper comparison with different antibodies.

      We appreciate the concerns raised by this reviewer. We agree that the additional prominent band at 50kDa could be a degradation product. As we explained in our previous response, despite using several PARG antibodies, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Immunostaining experiments may not be more conclusive, since IF experiments rely on the same antibodies for recognizing endogenous PARG. Additionally, even a protein mainly localizes in the cytosol, we cannot exclude the possibility that a small fraction of this protein may localize in nuclei and have nuclear functions.

      Instead, as we presented in our manuscript, we used a biochemical assay to measure PARG activity in cell lysate and showed that our initial PARG KO cells still have residual PARG activity. However, we could not detect any PARG activity in our complete/conditional PARG KO cells (cKO cells; these cells can only survive in the presence of PARP inhibitor). These data strongly suggest that PARG is essential for cell survival.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The author should evaluate the possibility of naturally occurring arrhythmia due to the geometry of the tissues, by using voltage or calcium dye.

      Answer: We thank the reviewer for this suggestion. We have performed new experiments using a voltage-sensitive fluorescent dye (i.e. FluoVolt) with data reported in the new Figure 4 + new results section “arrhythmia analysis”. Briefly, we found that our ring-shaped tissues are compatible with live fluorescence imaging. We were then able to show that our cardiac tissues beat regularly, without naturally occurring arrhythmias or extra beats. We could not detect any re-entrant waves in our tissues in the conditions offered by the speed of our camera. A specific paragraph has also been added to the discussion.

      (2) There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. We are currently expanding to other cell lines with improvement in survival (see https://insight.jci.org/articles/view/161356). We confirm that the rings do not detach. The pillar was specifically designed to avoid this (See figure 1B).

      As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance.

      Answer: We propose a system and a measurement method that do not need calibration. Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A). There is thus no influence of the distance of the camera lens.

      In order to evaluate the consistency of the mechanical properties of the hydrogel, we reproduced the experiment pictured in Figure1-Supplement 1, and measured the Young’s Modulus of three different gel solutions on different days. In the three experiments performed, we found values of 10.0-12.2 kPa, resulting in a final average value of 11.2 (+/- 0.6) kPa, coherent with the value reported in the article. We are therefore confident that the mechanical properties are consistent across and within wells. More extensive mechanical characterization of the molded gels would require the access to an Atomic Force Microscope (AFM), and is considered in the future.

      The author should address the longevity and reproducibility issues, by working on the calibration of camera lens position/distance to tissues and further optimizing the seeding conditions with hydrogels such as collagen or fibrin, and/or making sure the PEG gels have high reproducibility and consistency.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. This platform (including the design, approach and choice of polymers) allows a fast and reproducible formation of an important number of cardiac tissues (up to 21 per well in a 96-well format, meaning a potential total of about 2,000 tissues) with a limited number of cells.

      (3) The evaluation of the arrhythmia should be more extensively explained and demonstrated.

      Answer : See answer to comment 1

      (4) The results of isoproterenol should be checked as non-paced tissues should have increased beating frequency with increasing dosages. Dofetilide does not typically have a negative inotropic effect on the tissues. Please check on the cell viability before and after dosing

      Answer : We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training. We agree that above a concentration of 10nM, dofetilide shows cardiotoxicity in our tissues as tissues completely stop beating.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the general comments in the public review, I have the following specific suggestions to the authors, that would help improve the manuscript.

      (1) Please describe the protocol for preparation of cardiac rings (shown in Figure 1C) in more detail. In particular, please describe how the tissues were transferred from the mold into the 96-well plate and how are they positioned and characterized during the study.

      Answer: There is no transfer of the tissues as they directly form in the well, that is pre-equipped with the molded PEG gel (See Figure 1B and methods section). The in situ analysis is a strong asset of this platform.

      (2) Please clarify the timepoints in this study. The overall schematic in Figure 1 C shows that the rings were formed on day 22 and then studied for 14 days, while Figure 2B shows data over 20 days following seeding, and Figure 3 shows data 14 days after seeding. It appears that these were separate studies (optimization of myocyte/fibroblast ratio followed by the main study.

      Answer: Figure 1C is showing the timeline including the cardiomyocytes differentiation. hiPSC-CMs are indeed seeded in the wells 22 days after starting the differentiation, which represent the Day0 for tissue formation. We apologize for the confusion.

      (3) Please explain if the number of rings per well (Figure 2) was used as the only criterion for selecting the myocyte/fibroblast ratio, and if so, why. Were these rings also characterized for their structural and contractile properties?

      Answer: Figure 2 supplement 1 report the contractility data according to the different tested ratios, and show no differences. The number for generated ring-shaped tissues was indeed the only criterion retained.

      (4) Please provide rationale for using the dermal rather than cardiac fibroblasts.

      Answer: We had previous experience generating EHTs using dermal fibroblasts which are easier to obtain commercially. Our approach could in theory also work using cardiac fibroblasts, which we have not tested in the present study.

      (5) Figure 2 panels C-E show an interesting segregation of cardiomyocytes into a thin cylindrical layer that does not appear to contain fibroblasts and a shorter and thicker cylinder containing fibroblasts mixed with occasional myocytes. Please specify at which time point this structure forms, and how does it change over time in culture? At which time point were the images taken? It would be helpful to include serial images taken over 1-14 days of study.

      Answer: We thank the reviewer for this interesting comment. We have performed additional immunostainings (reported in Figure 2 supplement 3) on tissues at Day 1 and day 7 after seeding. The segregation appears in the 7 first days. It appears that 1 day after seeding the fibroblasts are not yet attached, although the cardiac fiber has already started to be formed. Seven days after seeding, fibroblasts are fully spread and attached, and the contractile ring is formed and well-aligned. Brightfield images are reported in Figure 1E.

      (6) In the cardiomyocyte region (Figure 2D) the cells staining for troponin seem to be only at the surfaces. The thickness of the layer is only about 30-40 µµ, so one would assume that cell viability was not an issue. Please specify and discuss the composition of this region.

      Answer: We agree but we think this is a technical issue as at the center of the tissue, tissue thickness will limit laser penetration, although at the surface (inner our outer), the laser infiltrates easily between the tissue and the PEG. Moreover, we see on the zoomed view of the tissue in Figure 2 Supplement 2 that we have a staining inside the cardiac fiber, which just appears less strong due to tissue thickness.

      (7) Please also discuss segregation in terms of possible causes and the implications of apparently very limited contact between the two cell types, i.e., how representative is this two-region morphology of native heart tissue. Also, it would be interesting to know how the segregation has changed with the change in myocyte/fibroblast ratio.

      Answer: We are not sure there is a very limited contact as the use of fibroblasts is critical to ensure the formation of tissues (i.e. no tissues can be formed if we avoid the use of fibroblasts). We agree that these ring-shaped cardiac tissues are not especially representative of a native heart tissue in terms of interactions between several cell types. They were developed as a surrogate for physiopathological and pharmacological experiments (see a recent application in https://insight.jci.org/articles/view/161356)

      (8) There is interest and demonstrated ability to culture engineered cardiac tissues over longer periods of time. Please comment what was the rationale for selecting 14-day culture and if the system allows longer culture durations.

      Answer: In line with this comment, we have studied the contractile parameters of our rings 28 days after seeding and compared to their contractile parameters at D14. We found a slight increase for all the parameters, which is significant for the maximum contraction speed. Nevertheless, the data is much more variable and the number of tissues is lower (29 for D14 against 17 for D28). Therefore, we demonstrated that long-term culture of our tissues is possible, however not yet optimized. Hence, the following physiological and pharmacological tests have been done at D14.

      (9) Figure 3 documents the development of contractile parameters over 14 days of culture. Would it be possible to replace the arbitrary units with the actual values? Also, would it be possible to include the corresponding images of the rings taken at the same time points, to show the associated changes in ring morphologies.

      Answer: Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A): it is a ratio, thus without unit. Corresponding images can be seen in Figure 1 E.

      (10) The measured contraction stress, strain, and the speeds of contraction and relaxation improve from day 1 to day 7 and then plateau (Figure 3, Supplemental Figure 3. Please discuss this result.

      Answer: The new immunostainings performed on tissues at Day 1 and Day 7 show the progressive alignment of the cardiomyocytes and the muscular fibers, with an almost complete organization at Day 7.

      (11) The beating frequency does not appear to markedly change over time, while Figure 3B shows strong statistical significance (***) throughout the 14-day period. Please check/confirm.

      Answer: We confirm this result.

      (12) Please comment on the lack of effect of isoproterenol on beating frequency.

      Answer: We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training.

      (13) Please compare the contractile function of cardiac tissues measured in this study with data reported for other iPSC-derived tissue models.

      Answer : A specific paragraph tackles this aspect in the discussion

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews

      We thank the reviewers for their insightful comments and helpful suggestions that allowed us to improve the manuscript.

      Reviewer #1:

      Thermogenic adipocyte activity associate with cardiometabolic health in humans but decline with age. Identifying the underlying mechanisms of this decline is therefore highly important.

      To address this task, Holman and co-authors investigated the effects of two major determinants of thermogenic activity: cold, which induce thermogenic de novo differentiation as well as conversion of dormant thermogenic inguinal adipocytes: and aging, which strongly reduce thermogenic activity. The authors study young and middle-aged mice at thermoneutrality and following cold exposure.

      Using linage tracing, the authors conclude that the older group produce less thermogenic adipocytes from progenitor differentiation. However, they found no differences between thermogenic differentiation capacity between the age groups when progenitors are isolated and differentiated in vitro. This finding is consistent with previous findings in humans, demonstrating that progenitor cells derived from dormant perirenal brown fat of humans differentiate into thermogenic adipocytes in vitro. Taken together, this underscores that age-related changes in the microenvironment rather than autonomous alterations in the ASPCs explain the age-related decline in thermogenic capacity. This is an important finding in terms of identifying new approaches to switch dormant adipocytes into an active thermogenic phenotype.

      To gain insight into the age-related changes, the authors use single cell and single nuclei RNA sequencing mapping of their two age groups, comparing thermoneutral and cold conditions between the two groups. Interestingly, where the literature previously demonstrated that de novo lipogenesis (DNL) occurs in relation to thermogenic activation, the authors show that DNL in fact is activated in a white adipocyte cell type, whereas the beige thermogenic adipocytes form a separate cluster.

      Considering recent findings, that adipose tissue contains several subtypes of ASPCs and adipocytes, mapping the changes at single cell resolution following cold intervention provides an important contribution to the field, in particular as an older group with limited thermogenic adaptation is analyzed in parallel with a younger, more responsive group. This model also allowed for detection of microenvironment as a determining factor of thermogenic response.

      The use of only two time points (young and middle-aged) along the aging continuum limits the conclusions that can be made on aging as the only driver of the observed differences between the groups. It should for example be noted that the older mice had higher weights and larger fat depots, thus the phenotype is complex and this should be taken into consideration when interpreting the data.

      In conclusion, this study provides an important resource for further studies on how to reactivate dormant thermogenic fat and potentially improve metabolic health.

      (1) The authors claim "Aging impairs cold-induced beige adipogenesis and adipocyte metabolic reprogramming". It is previously established in humans that aging strongly associate with a decline in thermogenic capacity. With this in mind, it is easy to accept that the reduced browning observed in the older group is due to age. However, the older group also have larger adipose depots, which also can be a confounding factor. I, therefore, recommend bringing this into the discussion and putting more focus on the complexity of the phenotype. For example, it could be discussed whether the de novo lipogenesis less due to that the adipocytes of older mice is already filled with more lipids. Additional time points along the aging continuum would be needed to make a strong conclusion about age as the determinant, but even so, aging is complex and further definitions and discussion would be needed.

      We agree with the reviewer regarding the confounding effect of body weight changes. We have added a paragraph to the discussion (pasted below) to comment on the complexity of the phenotype and the contributing role of linked changes in body weight/composition.

      “Aging is a complex process, and unsurprisingly, many pathways have been linked to the aging-related decline in beiging capacity. For example, increased adipose cell senescence, impaired mitochondrial function, elevated PDGF signaling and dysregulated immune cell activity during aging diminish beige fat formation (Benvie et al., 2023; Berry et al., 2017; Goldberg et al., 2021; Nguyen et al., 2021). Of note, older mice exhibit higher body and fat mass, which is associated with metabolic dysfunction and reduced beige fat development. While the effects of aging and altered body composition are difficult to separate, previous studies suggest that the beiging deficit in aged mice is not solely attributable to changes in body weight (Rogers et al., 2012). Further studies, including additional time points across the aging continuum may help clarify the role of aging and ascertain when beiging capacity decreases.”

      (2) The study would gain from more comparisons to existing human studies and discussion on the translation potential of the findings. For example, how does the adipocyte subtypes identified in the current study translate to subtypes identified in human adipose tissue (e.g. Emont et al).

      We analyzed the human adipose tissue atlas from Emont et al. 2022 (PMID: 35296864). We did not find any obvious homologous human adipocyte subtypes. However, this and other available human single cell studies have not investigated the effects of cold exposure on white adipose tissue depots, which may be necessary to reveal DNL-high and especially beige adipocytes.

      (3) The group has contributed multiple studies demonstrating that Prdm16 is a major inducer of a thermogenic phenotype, and the literature shows that Prdm16 promote a thermogenic phenotype in favour of a fibrogenic aging phenotype. It would therefore be interesting to see how Prdm16 is regulated in the current data set, across adipocytes subtypes, age groups and temperature conditions.

      We thank the reviewer for this comment. Previous studies showed that PRDM16 protein and not mRNA levels are downregulated during aging (Wang et al., 2019, Cell Metab, PMID: 31155495; Wang et al., 2022, Nature, PMID: 35978186). Consistent with this, we did not observe an agingassociated reduction in Prdm16 mRNA levels in adipocytes in our dataset. We did observe enrichment of Prdm16 mRNA levels in beige adipocytes relative to other adipocyte clusters. We included these data in Fig. 5F.

      (4) In Figure 1, it is difficult to understand why the 6 weeks cold exposure is not shown in relation to the thermoneutrality, 3 days and 2-week cold exposure? It would be useful to have this in the same graph relating the levels and showing all four marker genes for all time points.

      These experiments were done at different times using separate groups of mice. We have now clarified this in the figure legend.

      (5) The older mice had larger inguinal fat depots, suggesting more lipids stored. The morphology of adipose tissue has previously been shown to be modulated by cold acclimation and is also the main similarity between brown adipose tissue in adult humans and young mice beige adipose tissue. Fig S2b suggests smaller adipocytes in the young group. It would also be useful, for comparison to published data, if authors show tissue sections with H&E of their model.

      Good point. We added panels showing H&E staining of serial iWAT sections, showing changes in tissue morphology across age and temperature conditions (Figure S1F).

      (6) The authors use t-tests to compare the differences induced by e.g. cold or min vs max cell culture media etc, within each age group. However, in my opinion, a two-way Anova with post-tests would be more informative as this would allow for testing the effects of the two age categories on any quantitative variable and allow for addressing whether there is an interaction between the categories.

      Following the reviewer’s recommendation, we applied two-way ANOVA with a Tukey correction for multiple comparisons for categorical comparisons with different age groups and conditions. P values from all significant multiple comparison tests are now included within the methods section.

      (7) In Figure 5F, please include Adipoq expression between clusters and please add a reference to why Nnat is considered a canonical white adipocyte marker.

      We added Adipoq to the violin plot in Figure 5F, showing differential expression across adipocyte clusters. We included a line in the results section to highlight this observation:

      “Interestingly, Adiponectin (Adipoq) was differentially expressed across adipocyte clusters, with higher levels in Npr3-high and DNL-high cells.”

      We removed “canonical” and added references for Nnat and Lep as white marker genes.

      (8) After 14 days of cold exposure, it looks like the DNL high population divides into two populations, did the authors explore if there was any differences between these clusters?

      We also noticed this apparent division and explored this question. However, upon increasing the resolution for clustering and splitting the DNL high population, there were no obvious differentially expressed genes that defined the two subclusters. Thus, we opted to keep them together.

      (9) As cold treatment transform a subset of cells, can authors perform a data-driven analysis to visualize the directions in their single nuclei data sets by using monocle pseudotime and/or velocity analyses?

      This is a good question. We spent a long time trying to address this question using several trajectory and pseudotime analysis methods, including Velocity (scVelo), Slingshot and Dynoverse. Unfortunately, we were unable to obtain concordant results using at least two different methods and felt that the analyses were unreliable.

      Reviewer #2:

      This manuscript focused on why aging leads to decreased beiging of white adipose tissue. The authors used an inducible lineage tracing system and provided in vivo evidence that de novo beige adipogenesis from Pdgfra+ adipocyte progenitor cells is blocked during early aging in subcutaneous fat. Single-cell RNA sequencing of adipocyte progenitor cells and in vitro assays showed that these cells have similar beige adipogenic capacities in vitro. Single-cell nucleus RNA sequencing of mature adipocytes indicated that aged mice have more Npr3 high-expressing adipocytes in the subcutaneous fat from aged mice.

      Meanwhile, adipocytes from aged mice have significantly lower expression of genes involved in de novo lipogenesis, which may contribute to the declined beige adipogenesis.

      The mechanism that leads to age-related impairment of white adipose tissue beiging is not very clear. The finding that Pdgfra+ adipocyte progenitor cells contribute to beige adipogenesis is novel and interesting. It is more intriguing that the aging process represses Pdgfra+ adipocyte progenitor cells from differentiating into beige adipocytes during cold stimulation. Mature adipocytes that have high de novo lipogenesis activity may support beige adipogenesis is also novel and worth further pursuing. The study was carried out with a nice experimental design, and the authors provided sufficient data to support the major conclusions. I only have a few comments that could potentially improve the manuscript.

      (1) It is interesting that after three days of cold exposure, aged mice also have much fewer beige adipocytes. Is de novo adipogenesis involved at this early stage? Or does the previous beige adipocyte that acquired white morphology have a better "reactivation" in young mice? It would be nice if the author could discuss the possibilities.

      This is a good question. We did not evaluate beige adipogenesis at the 3d timepoint. However, a previous study demonstrates that 3d of cold exposure is sufficient to promote de novo beige adipogenesis (Wang et al., Nat Med. 2013, PMID: 23995282). We observed that beige adipogenesis from Pdgfra+ cells are a relatively minor contributor to beige adipocyte development, even after long term cold exposure in young mice. Based on these data, we presume that beige adipocyte activation (or re-activation) is the dominant mechanism for beige adipocyte development.

      To clarify this point, we have included the following lines in the manuscript:

      “Previous studies in mice using an adipocyte fate tracking system show that a high proportion of beige adipocytes arise via the de novo differentiation of ASPCs as early as 3 days of cold (Wang et al., 2013).”

      “Based on these findings, we presume that mature (dormant beige) adipocytes serve as the major source of beige adipocytes in our cold-exposure paradigm. However, long-term cold exposure also recruits smooth muscle cells to differentiate into beige adipocytes; a process that we did not investigate here (Berry et al., 2016; Long et al., 2014; McDonald et al., 2015; Shamsi et al., 2021).”

      (2) Is the absolute number of Pdgfra+ cells decreased in aged mice? It would be nice to include quantifications of the percentage of tomato+ beige adipocytes in total tomato+ cells to reflect the adipogenic rate.

      We presented FACS quantification of tdTomato+/Pdgfra+ cells in Fig. 2B. We added a graph showing the percentage of Pdgfra+ cells of total live, lin- cells in adipose tissue; this showed no difference between young and aged mice. We did not perform FACS quantification of tdTomato+ beige adipocytes due to the technical challenges with sorting adipocytes. Quantification of total tdTomato+ cells was also unreliable and inconsistent due to the widespread labeling of fibroblasts, blood vessels, along with traced adipocytes. Thus, we did not include this analysis.

      (3) Line 112, the sentence seems to be not finished.

      This has been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewers’ Public Comments

      We are grateful for the reviewers’ comments. We have modified the manuscript accordingly and detail our responses to their major comments below.

      (1) Reviewer 2 was concerned that transformation of continuous functional data into categorical form could reduce precision in estimating the genetic architecture.

      We agree that transforming continuous data into categories may reduce resolution, but it also improves accuracy when the continuous data are affected by measurement noise. In our dataset, many genotypes are at the lower bound of measurement, and the variation in measured fluorescence among these genotypes is largely or entirely caused by measurement noise. By transforming to categorical data, we dramatically reduced the effect of this noise on the estimation of genetic effects. We modified the results and discussion sections to address this point.

      (2) Reviewer 2 asked about generalizability of our findings.

      Because our paper is the first use of reference-free analysis of a 20-state combinatorial dataset, generalizability is at this point unknown. However, a recent manuscript from our group confirms the generality of the simplicity of genetic architecture: using reference-free methods to analyze 20 published combinatorial deep mutational scans, several of which involve 20-state libraries, we found that main and pairwise effects account for virtually all of the genetic variance across a wide variety of protein families and types of biochemical functions (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057). Concerning the facilitating effect of epistasis on the evolution of new functions, we speculate that this result is likely to be general: we have no reason to think that the underlying cause of this observation – epistasis brings genotypes with different functions closer in sequence space to each other and expands the total number of functional sequences – arises from some peculiarity of the mechanisms of steroid receptor DBD folding or DNA binding. However, we acknowledge that our data involve sequence variation at those sites in the protein that directly mediate specific protein-DNA contact; it is plausible that sites far from the “active site” may have weaker epistatic interactions and therefore have weaker effects on navigability of the landscape. We have addressed these issues in the discussion.

      (3) Reviewer 3 asked “in which situation would the authors expect that pairwise epistasis does not play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape?”

      The question addressed in our paper is not whether epistasis shapes steps, trajectories or connectedness in sequence space but how it does so and what its particular effects are on the evolution of new functions. The dominant view in the field has been that the primary role of epistasis is to block evolutionary paths. We show, however, that in multi-state sequence space, epistasis facilitates rather than impedes the evolution of new functions. It does this by increasing the number of functional genotypes and bringing genotypes with different functions closer together in sequence space. This finding was possible because of the difference in approach between our paper and prior work: most prior work considered only direct paths in a binary sequence space between two particular starting points – and typically only considering optimization of a single function – whereas we studied the evolution of new functions in a multi-state amino acid space, under empirically relevant epistasis informed by complete combinatorial experiments. The result is a clear demonstration that the net effect of real-world levels of epistasis on navigability of the multidimensional sequence landscape is to make the evolution of new functions easier, not harder.

      (4) Reviewer 3 asked for “an explanation of how much new biological results this paper delivers as compared with the paper in which the data were originally published.”

      Starr 2017 did not use their data to characterize the underlying genetic architecture of function by estimating main and epistatic effects of amino acid states and combinations; it also did not evaluate the importance of epistasis in generating functional variants, determining the transcription factor’s specificity, or shaping evolutionary navigability on the landscape.

      (5) Reviewer 3 requested an explanation of how the results would have been (potentially) different if a reference-based approach were used, and how reference-based analysis compares with other reference-free approaches to estimating epistasis.

      This topic has been covered in detail in a recent manuscript from our group (Park et al. Biorxiv 2023.09.02.556057). Briefly, reference-free approaches provide the most efficient explanation of an entire genotype-phenotype map, explaining the maximum amount of genetic variance and reducing sensitivity to experimental noise and missing genotypes compared to reference-based approaches. Reference-based approaches tend to infer much more epistasis, especially higher-order epistasis, because measurement error and local idiosyncrasy near the wild-type sequence propagate into spurious high-order terms. Reference-based analyses are appropriate for characterizing only the immediate sequence neighborhood of a particular “wild-type” protein of interest. Reference-free approaches are therefore best suited to understanding genotype-phenotype landscapes as a whole. We have clarified these issues in the revised discussion.

      (6) Reviewer 3 suggested that the comparison between the full and main-effects-only model should involve a re-estimation of main effects in the latter case.

      This is indeed what we did in our analysis. We have clarified the description in the results and methods sections to make this clear.

      (7) Reviewer 3 asked about the applicability of the approach to data beyond those analyzed in the present study and requirements to use it.

      Our approach could be used for any combinatorial DMS dataset in which the phenotypic data are categorical (or can be converted to categorical form). Complete sampling is not required: a virtue of reference-free analysis is that by averaging the estimated effects of states and combinations over all variants that contain them, reference-free analysis is highly robust to missing data (except at the highest possible order of epistasis, where only a single variant represents a high-order effect) as long as variant sampling is unbiased with respect to phenotype. All the required code are publicly available at the github link provided in this manuscript. We have also described a general form of reference-free analysis for continuous data and applied it to 20 protein datasets in a recent publication (Park et al. Biorxiv 2023.09.02.556057).

      (8)Reviewer 3 suggested that the text could be shortened and made less dense.

      We agree and have done a careful edit to streamline the narrative.

      Response to Reviewers’ Non-Public Recommendations

      (1) Reviewer 1 noted that specific epistatic effects might in some cases produce global nonlinearities in the genotype-phenotype relationship. They then asked how our results might change if we did not impose a nonlinear transformation as part of the genotype-phenotype model. The reviewer’s underlying concern was that the non-specific transformation might capture high-order specific epistatic effects and thus reducing their importance.

      Because our data are categorical, we required a model that characterizes the effect of particular amino acid states and combinations on the probability that a variant is in a null, weak, or strong activation class. A logistic model is the classic approach to this kind of analysis. The model structure assumes that amino acid states and combinations have additive effects on the log-odds of being in one functional class versus the lower functional class(es); the only nonlinear transformation is that which arises mathematically when log-odds are transformed into probability through the logistic link function. Thinking through the reviewer’s comment, we have concluded that our model does not make any explicit transformation to account for nonlinearity in the relationship between the effects of specific sequence states/combinations and the measured phenotype (activation class). If additional global nonlinearities are present in the genotype-phenotype relationship – such as could be imposed by limited dynamic range in the production of the fluorescence phenotype or the assay used to measure it – it is possible that the sigmoid shape of the logistic link function may also accommodate these nonlinearities. We have noted this part in the revised manuscript.

      (2) Reviewer 1 observed that our model seems to prefer sets of several pairwise interactions among states across sites rather than fewer high-order interactions among those same states.

      This finding arises because the pattern of phenotypic variation across genotypes in our dataset is consistent with that which would be produced by pairwise interactions rather than by high-order interactions. In a reference-free framework, these patterns are distinct from each other: a group of second-order terms cannot fit the patterns produced by high-order epistasis, and high-order terms cannot fit the pattern produced by pairwise interactions. Similarly, main-effect terms cannot fit the pattern of phenotypes produced by a pairwise interaction, and a pairwise epistatic term cannot fit the pattern produced by main effects of states at two sites. For example, third-order terms are required when the genotypes possessing a particular triplet of states deviate from that expected given all the main and second-order effects of those states; this deviation cannot be explained by any combination of first- and second-order effects.

      We explain this point in detail in our recent manuscript (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057) and we summarize it here. Consider the simple example of two sites with two possible states (genotypes 00, 01, 10, and 11). If there are no main effects and no pairwise effects, this architecture will generate the same phenotype for all four variants – the global average (or zero-order effect). If there are pairwise effects but no main effects, this architecture will generate a set of phenotypes on which the average phenotype of genotypes with a 0 at the first site (00 and 01) equals the global average – as does the average of those with 0 at the second site (00 and 10). The epistatic effect causes the individual genotypes to deviate from the global average. This pattern can be fit only by a pairwise epistatic term, not by first-order terms. Conversely, if there are main effects but no pairwise effects, then the average phenotype of genotypes 00 and 01 will deviate from the global average (by an amount equal to the first-order effect), as will the average of (00 and 10): the phenotype of each genotype will be equal to the sum of the relevant first-order effects for the state it contains. This pattern cannot be fit by second-order model terms. The same logic extends to higher orders: a cluster of second-order terms cannot explain variation generated by third-order epistasis, because third-order variation is by definition is the deviation from the best second-order model.

      (3) Reviewer 1 suggested several places in the text where citations to prior work would be appropriate.

      We appreciate these suggestions and have modified the manuscript to refer to most of these works.

      (4) Reviewer 1 pointed to the paper of Gong et al eLife 2013 and asked whether it is known how robust the proteins in our study are to changes in conformation/stability compared to other proteins, and whether this might impact the likelihood of observing higher-order epistasis in this system.

      The DBDs that we study here are very stable, and previous work shows that mutations affect DNA specificity primarily by modifying the DBD’s affinity rather than its stability (McKeown et al., Cell 2014). Additionally, Gong et al.’s findings pertain to a globally nonlinear relationship between stability and function, which arises from the Boltzmann relationship between the energy of folding and occupancy of the folded state. Because our data are categorical – based on rank-order of measured phenotype rather than fluorescence as a continuous phenotype – the kind of global nonlinearity observed in Gong’s study are not expected to produce spurious estimates of epistasis in our work. We have modified the discussion to discuss the point.

      (5) Reviewer 1 asked a) why the epistatic models produce landscapes on which variants have fewer neighbors on average than main-effects only models and b) why the average distance from all ERE-specific nodes to all SRE-specific nodes is greater with epistasis (but the average distance from ERE to nearest SRE is lower with epistasis).

      In the main effects-only landscape, the functional genotypes are relatively similar to each other, because each must contain several of the states that contribute the most to a positive genetic score. Moreover, ERE-specific nodes are similar to each other, and SRE-specific nodes are similar to each other, because each must contain one or more of a relatively small number of specificity-determining states. When epistasis is added to the genetic architecture, two things happen: 1) more genotypes become functional because there are more combinations that can exceed the threshold score to produce a functional activator and 2) these additional functional variants are more different from each other – in general, and within the classes of ERE- or SRE-specific variants – because there are now more diverse combinations of states that can yield either phenotype. As a result, a broader span of sequence space is occupied, but ERE- and SRE-specific variants are more interspersed with each other. This means that the average distance between all pairs of nodes is greater, and this applies to all ERE-SRE pairs, as well. However, the interspersing means that the closest single SRE to any particular ERE is closer than it was without epistasis. We have added this explanation to the main text.

      (6) Reviewer 2 asked us to explain why average path length increases with pairwise epistasis as the strength of selection for specificity increases.

      This behavior occurs because of the existence of a local peak in the pairwise model. Genotypes on this peak contained few connections to other genotypes, all of which were less SRE specific. Thus, with strong selection, i.e. high population size, the simulations became stuck on the local peak, cycling among the genotypes many times before leaving, resulting in a large increase in the mean step number. As shown in the rest of the figure, when the longest set of paths are removed, there are still differences in the average number of steps with and without epistasis. This issue is described in the methods section.

      (7) Reviewers made several suggestions for clarity in the text and figures.

      We have modified the paper to address all of these comments.

      (8) Reviewer 3 stated that the code should be available.

      The code is available at https://github.com/JoeThorntonLab/DBD.GeneticArchitecture.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to understand the relationship between the development of large trunks and longirrostrine mandibles in bunodont proboscideans of Miocene, and how it reflects the variation in diet patterns.

      Strengths:

      The study is very well supported, written, and illustrated, with plenty of supplementary material. The findings are highly significant for the understanding of the diversification of bunodont proboscideans in Asia during Miocene, as well as explaining the cranial/jaw disparity of fossil lineages. This work elucidates the diversification of paleobiological aspects of fossil proboscideans and their evolutionary response to open environments in the Neogene using several methods. The authors included all Asian bunodont proboscideans with long mandibles and I suggest that they should use the expression "bunodont proboscideans" instead of gomphotheres.

      Weaknesses:

      I believe that the only weakness is the lack of discussion comparing their results with the development of gigantism and long limbs in proboscideans from the same epoch.

      Thank you for your comprehensive review and positive feedback on our study regarding the co-evolution of feeding organs in bunodont proboscideans during the Miocene. We appreciate your suggestion, and have decided to use the term "bunodont elephantiforms" (for more explicit clarification, we use elephantiforms to exclude some early proboscideans, like Moeritherium, ect.) instead of "gomphotheres," and we will make this change in our revised manuscript. We also appreciate the potential weakness you mentioned regarding the lack of discussion comparing our results with the development of gigantism and long limbs in proboscideans from the same epoch. We agree with the reviewer’s suggestion, and we are aware that gigantism and long limbs are potential factors for trunk development. Gigantism resulted in the loss of flexibility in elephantiforms, and long limbs made it more challenging for them to reach the ground. A long trunk serves as compensation for these limitations. limb bones were rare to find in our material, especially those preserved in association with the skull.

      Reviewer #2 (Public Review):

      This study focuses on the eco-morphology, the feeding behaviors, and the co-evolution of feeding organs of longirostrine gomphotheres (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) which are characterised by their distinctive mandible and mandible tusk morphologies. They also have different evolutionary stages of food acquisition organs which may have co-evolve with extremely elongated mandibular symphysis and tusks. Although these three longirostrine gomphothere families were widely distributed in Northern China in the Early-Middle Miocene, the relative abundances and the distribution of these groups were different through time as a result of the climatic changes and ecosysytems.

      These three groups have different feeding behaviors indicated by different mandibular symphysis and tusk morphologies. Additionally, they have different evolutionary stages of trunks which are reflected by the narial region morphology. To be able to construct the feeding behavior and the relation between the mandible and the trunk of early elephantiformes, the authors examined the crania and mandibles of these three groups from the Early and Middle Miocene of northern China from three different museums and also made different analyses.

      The analyses made in the study are:

      (1) Finite Element (FE) analysis: They conducted two kinds of tests: the distal forces test, and the twig-cutting test. With the distal forces test, advantageous and disadvantageous mechanical performances under distal vertical and horizontal external forces of each group are established. With the twig-cutting test, a cylindrical twig model of orthotropic elastoplasity was posed in three directions to the distal end of the mandibular task to calculate the sum of the equivalent plastic strain (SEPS). It is indicated that all three groups have different mandible specializations for cutting plants.

      (2) Phylogenetic reconstruction: These groups have different narial region morphology, and in connection with this, have different stages of trunk evolution. The phylogenetic tree shows the degree of specialization of the narial morphology. And narial region evolutionary level is correlated with that of character-combine in relation to horizontal cutting. In the trilophodont longirostrine gomphotheres, co-evolution between the narial region and horizontal cutting behaviour is strongly suggested.

      (3) Enamel isotopes analysis: The results of stable isotope analysis indicate an open environment with a diverse range of habitats and that the niches of these groups overlapped without obvious differentiation.

      The analysis shows that different eco-adaptations have led to the diverse mandibular morphology and open-land grazing has driven the development of trunk-specific functions and loss of the long mandible. This conclusion has been achieved with evidence on palaecological reconstruction, the reconstruction of feeding behaviors, and the examination of mandibular and narial region morphology from the detailed analysis during the study.

      All of the analyses are explained in detail in the supplementary files. The 3D models and movies in the supplementary files are detailed and understandable and explain the conclusion. The conclusions of the study are well supported by data.

      We appreciate your detailed and insightful review of our study. Your summary accurately captures the essence of our research, and we are pleased to note that multiple research methods were used to demonstrate our conclusions. Your recognition of the evidence-based conclusions from paleoecological, feeding behavior reconstruction, and morphological analyses reinforces the validity of our findings. Once again, we appreciate your time and thoughtful reviews.

      Reviewer #1 (Recommendations For The Authors):

      Thank you very much for the invitation to review this amazing manuscript. It is very well written and supported, and I have only minor suggestions to improve the text:

      (1) Some references are not in chronological sequence in the text, and this should be reviewed.

      We greatly appreciate the positive comments of the reviewer. We revised the reference of the manuscript as the reviewer’s suggestion.

      (2) I suggest the use of the expression "bunodont proboscideans" instead of Gomphotheres because there is no agreement if Amebelodontidae and Choerolophodontidae are within Gomphotheriidae, as well as some brevirrostrine bunodont proboscideans from South America. So I think it is ok to use "Gomphotheriidae", but not gomphotheres to refer to all bunodont proboscideans included in the study.

      The reviewer is correct. Using “gomphotheres” to refer to these three groups is inappropriate. We have replaced “gomphotheres” with "bunodont elephantiforms" throughout the entire manuscript. Here, we use “elephantiforms”, not “proboscideans”, to avoid confusion with some early proboscidean members like Moeritherium, ect.

      (3) I was expecting some discussion on the development of large trunks related to the gigantism in these bunodont proboscideans, regarding the huge skulls and the columnar limbs.

      We appreciate this suggestion, and we are aware that gigantism is a potential factor for trunk development. It is difficult to compare the three groups (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) in terms of their weight and limb bone length, because in our material, limb bones were rarely found, especially those associated with cranial material. Nevertheless, at this stage, all elephantiforms had significantly enlarged cranial sizes and limb bone lengths compared to early members like Phiomia. Gigantism caused the loss of flexibility in elephantiforms, and even the long limbs made it more difficult for an elephantiform to reach the ground. A long trunk compensates for this evolutionary change. Exploring these aspects further is a part of our future work.

      (4) The reference to Alejandro et al should be replaced by Kramarz et al (and the correct surname of the authors). The name and surname of this reference need to be corrected. The correct names are Kramarz, A., Garrido, A., Bond, M. 2019. Please correct this in the text too.

      We thank the reviewer for catching this error. This reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      I believe your paper will lead to other studies on other Proboscidean groups on the evolution of the mandible and trunk. There are some corrections in the text:

      • In line 199 in the text in pdf, "Tassy, 1994" should be "Tassy, 1996".

      • In line 241, "studied" should be "studies"

      • In line 313, "," after the word "tool" should be "."

      We appreciate the reviewer for pointing these errors out and have revised these based on the suggestions.

      • In the References, you write "et al." in some references. You should write the names of all of the authors.

      • In the References: "Lister AM. 2013" and "Shoshani&Tassy" are not referenced in the text.

      • In the References: "Tassy P. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn." should be "Tassy P. 1994. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn. 112, 1-2, 101-117" and replaced before "Tassy P. 1996".

      We appreciate the reviewer’s suggestions and have revised these references.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      The authors provided experimental data in response to my comments/suggestions in the revision. Overall, most points were appropriate and satisfactory, but some issues remain.

      (1) It is not fully addressed how atypical survivors are generated independently of Rad52-mediated homologous recombination.

      The newly provided data indicate that the formation of atypical telomeres is independent of the Rad52 homologous recombination pathway.

      "The atypical telomeres clones exhibit non-uniform telomere pattern", but the TG-hybridized signals after XhoI digestion are clear and uniform.

      "Atypical telomere" clones may carry circular chromosomes embedded with short TG repeats, rather than linear chromosomes. In other words, atypical telomeres may differ from telomeres, the ends of chromosomes. Is atypical telomere formation dependent on NHEJ? Given that "two chromosomes underwent intra-chromosomal fusions" (Line 248), are atypical telomere clones detected frequently in SY13 cells containing two chromosomes?

      We thank the reviewer’s questions. Frankly, we have not been able to determine the chromosome structures in these so-called "atypical survivors". As we mentioned in the manuscript, there could be mixed telomere structures, e.g. TG tract amplification, intro-chromosome telomere fusion and inter-chromosome telomere fusion. Worse still, these 'atypical survivors' may not have maintained a stable genome, and their karyotype may have undergone stochastic changes during passages. To avoid misunderstanding, we change the term "atypical" to "uncharacterized" in the revised manuscript.

      We have previously shown that deletion of YKU70 does not affect MMEJ-mediated intra-chromosome fusion in single-chromosome SY14 cdc13Δ cells (Wu et al., 2020). In SY12 cells, double knockout of TLC1 and YKU resulted in synthetic lethality, and we were unable to continue our investigation. The result of synthetic lethality of TLC1 and YKU70 double deletion was shown in the Figure 7B in the reviewed preprint version 1, and the result was not included in the reviewed preprint version 2 in accordance with the reviewer's instructions.

      "Atypical” survivors could be detected in SY13 cells (Figure 1D), but the frequency of their formation in the SY13 strain appeared to be lower than in SY12. As one can imagine, SY13 contains two chromosomes and its survivors should have a higher frequency of intra-chromosome fusions.

      (2) From their data, it is possible that X and Y elements influence homologous recombination, type 1 and type 2 (type X), at telomeres. In particular, the presence of X and Y elements appears to be important for promoting type 1 recombination. In other words, although not essential, subtelomeres have some function in maintaining telomeres. I suggest that the authors include author response image 4 in the text. They could revise their conclusion and the paper title accordingly.

      According to this suggestion, we have included author response image 4 in the revised manuscript as Figure 2E, Figure 5D, Figure 6C and Figure 6E. Accordingly, we have changed the title as “Elimination of subtelomeric repeat sequences exerts little effect on telomere essential functions in Saccharomyces cerevisiae”.

      (3) Minor points: The newly added data indicate that X survivors are generated in a type 2-dependent manner. The authors could discuss how Y elements were eroded while retaining X elements (line 225, Figure 2A).

      Thank this reviewer’s suggestion. We have discussed it in the revised manuscript (p.13 line 244-245). When telomere was deprotected, chromosome end resection took place. Since SY12 only has one Y’-element, it is hard to search homology sequences to repair the Y’-element in XVI-L. When the X-element in XVI-L was exposed by further resection, it is easier to find homology sequences to repair. So, in Type X survivor the Y’-element was eroded while retaining X-element.

      Reviewer #2

      I would like to congratulate the authors for their work and the efforts they put in improving the manuscript. The major criticism I had previously, ie testing the genetic requirements for the survivor subtypes, has been met. Below are a few minor comments that don't necessarily require a response.

      (1) I think the Author response image 6 could have been included in the manuscript. I understand that the authors don't want to overinterpret survivor subtype frequencies, but this figure would have suggested some implication of Rad51 in the emergence of survivors even in the absence of Y' elements. At this stage, however, it is up to the authors, and leaving this figure out is also fine in my opinion.

      According to the suggestion, the author response image 6 has been presented as Figure 6—figure supplement 7.

      (2) Chromosome circularization seems to rely on microhomologies. Previously, the authors proposed that SY14 circularization depended on SSA (Wu et al. 2020), but here, since circularization appears to be Rad52-independent, it is likely to be based on MMEJ rather than SSA (although there are contradictory results on Rad52's role in MMEJ in the literature).

      Yes, we mentioned it in the revised manuscript.

      (3) p. 28 lines 511-513: "The erosion sites and fusion sequences differed from those observed in SY12 tlc1Δ-C1 cells (Figure 2D), suggesting the stochastic nature of chromosomal circularization": I don't think they are necessarily stochastic, because the sequences beyond the telomeres are now modified, the available microhomologies have changed as well.

      We agreed with your opinion. In different chromosomes, there tend to be some hotspots for chromosome fusion. For example, in Figure 6C and 6F the resection site in Chr1 and Chr2 was the same in SY12XYΔ+Y tlc1Δ-C1 and SY12XYΔ tlc1Δ-C1. So, we speculate that there are some hotspots for chromosome fusion, but which site the cell will choose in one round chromosome fusion event is stochastic.

      (4) Typos and other errors:

      • p. 3 line 52: "subtelomerice" and "varies" are mispelled.

      • p. 5 line 78: "processes" should be "process".

      • Supp files are mislabelled (the numbers do not correspond to file name).

      • Supp file 2: how come SY12 has only one Y' element and SY13 has two?

      • p. 10 line 175: "emerging" should be "emergence".

      • p.15 line 276: "counter-selected" should be "being counter-selected" or "counterselection".

      • p. 29 line 523: "the formation of them" should be "their formation".

      • p. 37 line 653: "could have been an ideal tool": the sentence is grammatically incorrect. Writing "AND could have been an ideal tool" is enough to make it structurally correct.

      Thanks for pointing these errors out. We have corrected them in the revised manuscript. For the question “how come SY12 has only one Y' element and SY13 has two?” we were not sure at this moment. We speculated that one of the Y’ might be lost during genetic engineering of the chromosomes by CRISPR–Cas9 system.

      Reviewer #3

      The authors included statistical analyses of the qPCR data (Fig 4B) as requested, but did not comment on the striking difference in expression of MPH3 and HSP32 in the SY12 strain compared to BY4742. An improvement of the manuscript is the inclusion of rad52 tlc1 strains in their analyses, demonstrating that the "atypical and circular survivors" arose independently of homologous recombination. In addition, by analyzing rad51 and rad50 mutant strain they could demonstrate that the "type X" survivors had similar molecular requirements to type II survivors. Overall, the revised submission improves the article.

      We thank the reviewer’s comments and suggestions. The SY12 strain (with three chromosomes) exhibited lower expression levels of both MPH3 and HSP32 compared to the parental strain BY4742 (with 16 chromosomes). We speculated that with the reduced chromosome numbers, the silencing proteins appeared to no longer be titrated by other telomeres that have been deleted. We have added these comments in the revised manuscript.

      Wu, Z.J., Liu, J.C., Man, X., Gu, X., Li, T.Y., Cai, C., He, M.H., Shao, Y., Lu, N., Xue, X., et al. (2020). Cdc13 is predominant over Stn1 and Ten1 in preventing chromosome end fusions. Elife 9.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This valuable study describes a new role of epithelial intercellular adhesion molecule 1 (ICAM-1) protein in controlling bile duct size. The effect is mediated via EBP-50 and subapical actomyosin to regulate size of bile canaliculi. These solid findings have theoretical and practical implications in hepatology and human disorders of bile ducts.

      Public Reviews:

      In this study, Cacho-Navas et al. describe the role of ICAM-1 expressed on the apical membrane of bile canaliculi and its function to control the bile canaliculi (BCs) homeostasis. This is a previously unrecognized function of this protein in hepatocytes. The same authors have previously shown that basolateral ICAM-1 plays a role in controlling lymphocyte adhesion to hepatocytes during inflammation and that this interaction is responsible for the loss of polarity of hepatocytes during disease states.

      This new study shows that ICAM-1 is mainly localized in the apical domain of the BC and in association with EBP-50, communicates with the subapical acto-myosin ring to regulate the size and morphology of the BC. They used the well-known immortal cell line of liver cells (HepG2) in which they deleted ICAM-1 gene by CRISPR-Cas9 editing and hepatic organoids derived from WT and ICAM-1-KO mice. alternating KO as well as rescue experiments. They show that in the absence of apical ICAM-1, the BC become dilated.

      The data sufficiently support the conclusions of the study.

      Recommendations for the authors:

      We would like to thank the editor and reviewer for recognizing the manuscript's value and the solid nature of the data. We are also thankful to them for acknowledging that the manuscript supports the conclusions. Below, we have addressed their commentaries and questions in a point-by-point rebuttal document:

      We have a few suggestions to improve the manuscript:

      (1) HepG2 cells form canaliculi-like structures but are not the ideal system to study the apical basal polarity. On the other hand, hepatic organoids can assume a hepatocyte-like phenotype, when cultured under specific conditions but are not functionally comparable to hepatocytes organized in a 3D structure with a hollow lumen that does not recapitulate the BC physiological structure. Therefore, primary hepatocyte in collagen sandwich would be the best model to study the polarization of BCs and could be isolated from WT and ICAM-1-KO mice, that are available. Some of the major findings should be confirmed in this system.

      We adopted the culture of hepatic organoids as an experimental strategy motivated by the difficulties to culture primary hepatocytes experienced in previous analyses (RegleroReal, Cell Rep, 2014). The generation of organoids or mature hepatocytes from various sources of stem cells is a commonly employed strategy in hepatocyte cell biology (Meyer et al. EMBO Rep, 2023), due to the difficulties in maintaining mature hepatic epithelial cell cultures for longer than a few hours.

      The hepatic organoids we have used in the manuscript are being accepted as advanced cellular strategies for a broad range of fields (Belenguer, Nat Commun, 2022; de Crignis, eLife, 2021; Huch, Cell, 2015). Despite they have some morphological differences with real hepatocytes, we conducted a thorough characterization of their organization identifying canalicular-like structures with functional (CFDA) and molecular (HA-4) markers, which we believe adds value to the manuscript. In addition, the organoid technology has allowed us to import the bipotent precursors to get an permanent source of hepatic cells without the need to import and use the ICAM-1_KO mice, in line with the current guides to reduce animal experimentation.

      Taking this into account and to further validate data obtained with our cellular systems, we carried out a quantification of the canalicular diameter in livers from WT and ICAM1_KO cells (New Figure 8B), which validates our data on human cell lines and organoids. We acknowledge that the data obtained from hepatic tissues cannot rule out the contribution of immune cell adhesion to changes in the hepatocyte architecture. However, these experiments, together with the aforementioned organoids and human cell lines, strongly suggest a role for hepatic ICAM-1 in regulating canalicular size.

      (2) Overexpression of proteins was used in the study. While this approach is an easier means to visualize, without the use of specific antibodies, it is known to alter the distribution of the protein compared to the endogenous one.

      Most of our characterization has been done with antibodies or other fluorescent tools against endogenous proteins localized at BCs: CD59, F-actin, EBP50, MHC, MLC…. In addition, we have included MDR1-GFP and GFP-Rab11, the latter to analyze the subapical compartment (SAC) surrounding BCs. As requested by the reviewer, we now include in a new Supplementary Figure 1C the confocal analyses of endogenous canalicular markers, radixin and MRP2, as well as a new Supplementary Figure 1D containing the staining of an endogenous marker of the SAC, plasmolipin/PLLP (Fraticelli et al, Nat Cell Biol, 2015; Cacho-Navas, Cell Mol Life Sci, 2022), which is consistent with the previous analyses performed with GFP-Rab11.

      (3) In the absence of ICAM-1, BCs change shape and dimension but still show the presence of microvilli. What happens to the distribution of polarized transporters like Mrp2, or the transport of bile acids (CFDA clearance) in vivo in the KO animal?

      Thank you for this comment. We have analyzed this transporter in murine livers and human hepatic cells. MRP2 distribution does not significantly change and is concentrated in BCs also in ICAM-1_KO livers (New Figure 8C). Likewise, ICAM-1 gene edition does not affect MRP2 localization in the polarized human hepatic epithelial cell line in vitro (Supplementary Figure 1C). We cannot rule out changes for this transporter in other murine liver cell types in vivo, such as sinusoidal endothelial cells, which we believe should be further addressed in a different piece of work.

      (4) Does the lack of ICAM-1 affect the cell viability, proliferation or cell size?

      ICAM-1_KO cells proliferate slightly more slowly than their WT counterparts, with no detected changes in cell size and death. We present these data in Supplementary Figure 1, A and B.

      (5) Are the findings recapitulated in the livers of ICAM-1 KO animals?

      ICAM-1 KO animals present enlarged BCs, which is consistent with the main findings of the manuscript (Figure 8B).

      The text needs to be more concise. Some of the concepts, in particular those already published, should be condensed. There is a large amount of experiments that are difficult to connect logically. Possibly, cartoons summarizing the approach of the figure could help the reader.

      The text of Results and Discussion sections has been shortened by almost 100 words, despite the additional panels and experiments are now described and discussed. New cartoons have been added in Figure 5G and Figure 8F, in addition to those previously included in Figure 1 and Supplementary Figure 6, the latter containing a graphical descriptions of the main conclusions.

      Also, more detailed information about statistical analysis (what post-test was used?), concentration of cytokines, and description of the mouse model should be included in the methods.

      Cytokine concentrations have been included in the legend of Figure 3 and in the Cell and Culture section of Methods. A brief description of the ICAM-1_KO mouse and the corresponding reference for further information is also provided in the Organoid Culture section of Methods. A statistical analysis section describing the post-test used is also included at the end of Methods. The references of anti-plasmolipin, anti-radixin and antiMRP2 antibodies, as well as the new fixation methods used for immunofluorescence are also included in the corresponding Antibody List and in the Confocal Microscopy section of Methods, respectively . .

      Figure 3D. Sample names should be added as in the rest of the figures.

      The arrangement of sample names in Figure 3D has been revised and is now similar to that of Figure 3A.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Song, Shi, and Lin use an existing deep learning-based sequence model to derive a score for each haplotype within a genomic region, and then perform association tests between these scores and phenotypes of interest. The authors then perform some downstream analyses (fine-mapping, various enrichment analyses, and building polygenic scores) to ensure that these associations are meaningful. The authors find that their approach allows them to find additional associations, the associations have biologically interpretable enrichments in terms of tissues and pathways, and can slightly improve polygenic scores when combined with standard SNP-based PRS.

      Strengths:

      • I found the central idea of the paper to be conceptually straightforward and an appealing way to use the power of sequence models in an association testing framework.

      • The findings are largely biologically interpretable, and it seems like this could be a promising approach to boost power for some downstream applications.

      Weaknesses:

      • The methods used to generate polygenic scores were difficult to follow. In particular, a fully connected neural network with linear activations predicting a single output should be equivalent to linear regression (all intermediate layers of the network can be collapsed using matrix-multiplication, so the output is just the inner product of the input with some vector). Using the last hidden layer of such a network for downstream tasks should also be equivalent to projecting the input down to a lower dimensional space with some essentially randomly chosen projection. As such, I am surprised that the neural network approach performs so well, and it would be nice if the authors could compare it to other linear approaches (e.g., LASSO or ridge regression for prediction; PCA or an auto-encoder for converting the input to a lower dimensional representation).

      Response: We thank the reviewer for the recognition and valuable suggestion on our work. Just as the reviewer suggested, our polygenic prediction procedure is equivalent to linear transformation and in this revision, we indeed found that it was unnecessary to use neural network framework to replace linear model. Indeed, both our result and previous work indicated that linear model fitted polygenic traits better than non-linear one, which was also the reason we chose linear activation for neural network in the original manuscript.

      In this revision, we followed the reviewer’s suggestion to apply a more straightforward linear framework for polygenic prediction. We first calculated weighted sum of HFS for each block (1,361 independent blocks in total), then, in each target ancestry, we used LASSO regression to integrate them with SNP PRS into one final score. We also conducted comparative analysis in British European test set and found that LASSO, ridge and elastic net gave similar result, and LASSO performed slightly better. By applying this straightforward framework and sliding window strategy, we moderately improved the prediction performance.

      Line 349: “Using height as a representative trait, we first estimated the proportion of variance captured by top loci, and found that HFS of loci with PIP>0.4 (n=5,101) captured roughly 80% of variance explained by all genome-wide loci (n=1,200,024 corresponded to sling-window strategy; Figure 5A). We then calculated HFS+LDAK in non-British European (NBE), South Asian (SAS), East Asian (EAS) and African (AFR) population in UK Biobank, and observed 17.5%, 16.1%, 17.2% and 39.8% improvement over LDAK alone (p=3.21×10-16, 0.0001, 0.002 and 0.001, respectively. Figure 5C).”

      Author response image 1.

      • A very interesting point of the paper was the low R^2 between the HFS scores in adjacent windows, but the explanation of this was unclear to me. Since the HFS scores are just deterministic functions of the SNPs, it feels like if the SNPs are in LD then the HFS scores should be and vice versa. It would be nice to compare the LD between adjacent windows to the average LD of pairs of SNPs from the two windows to see if this is driven by the fact that SNPs are being separated into windows, or if sei is somehow upweighting the importance of SNPs that are less linked to other SNPs (e.g., rare variants).

      Response: We thank the reviewer for the suggestion on understanding LD mechanism. In this revision, we used chromosome 1 as an example and calculate the pairwise LD among all SNPs within two adjacent loci. As shown in Figure S1 (below), although HFS-based LD is still significantly lower than median SNP-based LD (paired Wilcoxon test p=1.76e-5), we found that median SNP LD between loci was still lower than what typically observed between adjacent SNPs in GWAS (histogram of x axis; median =0.06). We reasoned that dividing SNPs into block is one of the reasons that HFS suffer less LD than standard GWAS, but not the whole story.

      Author response image 2.

      We agree with the reviewer that the effect of rare variants could also play an important role. In fact, sei author has also found that rare variants tended to have larger sei-predicted effects. We conducted an approximate analysis that remove all rare variants and repeated HFS calculation. Indeed, here HFS LD has profoundly raised to median=0.14, indicating that involving rare variants was vital for low LD.

      Author response image 3.

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      • There were also a number of robustness checks that would have been good to include in the paper. For instance, do the findings change if the windows are shifted? Do the findings change if the sequence is reverse-complemented?

      Response: Following the reviewer’s suggestion, we conducted a sliding window analysis where all loci were shifted 2048 bp, thereby doubling the total number of loci. In fine-mapping analysis, more than 90% of the causal loci were reproduced in sliding window analysis, either by themselves or by a overlapping locus:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction analysis, sliding window strategy significantly improved prediction accuracy, as we discussed in question 1.

      As for the issue of reverse complement, the nature of sei input layer is to encode both strand in a symmetric manner, such that the output for both strands would be the same. We have also run sei on the reverse complement (generated by seqkit seq -r -p) to verify that original sequence and reverse complement give the same output.

      Response: Following the reviewer’s suggestion, we added a new discussion paragraph on the issue of sequence model performance on interindividual variations. In brief, we suggest that although the drawback of lack of cross-individual training sets exists and future improvement is necessary, chromatin changes could be better predicted than gene expression. This is because the latter task requires information on long range interaction, which varies among genes and are difficult to be captured by using reference genome as training set. We made a schematic to clarify this:

      Author response image 4.

      We also noticed a few recent studies that directly validated sei predictions by experiments and showed significant accuracy, such as https://doi.org/10.1016/j.neuron.2022.12.026. Taken together, while we agreed that it is necessary to improve sequence model by adding more cross-individual training samples, the current SOTA model sei could still provide unique value to our study.

      Line 423: “The challenge of using sequence-based deep learning (DL) models in HFS applications is further compounded by their difficulty in predicting variations between individuals. Recent studies(Huang et al., 2023; Sasse et al., 2023) indicate that DL models, trained on the reference human genome, demonstrate limited accuracy in predicting gene expression levels across different individuals. This limitation is likely due to the models' inability to account for long-range regulatory patterns, which are crucial for understanding the impact of variants on gene expression and vary across genes. In contrast, our study leveraged sequence-determined functional genomic profiles in association studies, which mitigates this issue to an extent. For instance, although sei cannot identify the specific gene regulated by a given input sequence, it can predict changes in the sequence's functional activity. Future improvements in DL models' ability to predict interindividual differences could be achieved by incorporating cross-individual data in the training process. An example of such data is the EN-TEX(Rozowsky et al., 2023) dataset, which aligns functional genomic peaks with the specific individuals and haplotypes they correspond to.”

      Reviewer #2 (Public Review):

      Summary:

      In this work, Song et al. propose a locus-based framework for performing GWAS and related downstream analyses including finemapping and polygenic risk score (PRS) estimation. GWAS are not sufficiently powered to detect phenotype associations with low-frequency variants. To overcome this limitation, the manuscript proposes a method to aggregate variant impacts on chromatin and transcription across a 4096 base pair (bp) loci in the form of a haplotype function score (HFS). At each locus, an association is computed between the HFS and trait. Computing associations at the level of imputed functional genomic scores should enable the integration of information across variants spanning the allele frequency spectrum and bolster the power of GWAS.

      The HFS for each locus is derived from a sequence-based predictive model. Sei. Sei predicts 21,907 chromatin and TF binding tracks, which can be projected onto 40 pre-defined sequence classes ( representing promoters, enhancers, etc.). For each 4096 bp haplotype in their UKB cohort, the proposed method uses the Sei sequence class scores to derive the haplotype function score (HFS). The authors apply their method to 14 polygenic traits, identifying ~16,500 HFS-trait associations. They finemap these trait-associated loci with SuSie, as well as perform target gene/pathway discovery and PRS estimation.

      Strengths:

      Sequence-based deep learning predictors of chromatin status and TF binding have become increasingly accurate over the past few years. Imputing aggregated variant impact using Sei, and then performing an HFS-trait association is, therefore, an interesting approach to bolster power in GWAS discovery. The manuscript demonstrates that associations can be identified at the level of an aggregated functional score. The finemapping and pathway identification analyses suggest that HFS-based associations identify relevant causal pathways and genes from an association study. Identifying associations at the level of functional genomics increases the portability of PRSs across populations. Imputing functional genomic predictions using a sequence-based deep learning model does not suffer from the limitation of TWAS where gene expression is imputed from a limited-size reference panel such as GTEx.

      However, there are several major limitations that need to be addressed.

      Major concerns/weaknesses:

      (1) There is limited characterization of the locus-level associations to SNP-level associations. How does the set of HFS-based associations differ from SNP-level associations?

      Response: We thank the reviewer for the recognition and the valuable suggestion on our manuscript. Following the reviewer’s suggestion, in this revision we added a paragraph to compare the basic characteristics between HFS-based and SNP-based association study. These comparisons suggested that HFS had no advantage in testing marginal association, but performed better in detecting causal associations.

      Line 144: “When comparing HFS association with the standard SNP-based GWAS on the same data, we found that 98% of significant HFS loci also harbored a significant SNP. There were a few cases (n=0~5) where significant HFS loci did not harbored even marginal SNP association (GWAS p>0.01), which were due to the lack of common SNP in these loci. HFS association p value was higher than GWAS p value in 95 % of significant loci, suggested that HFS did not improve power to detect marginal effect. The genomic control inflation factor (λGC) for the HFS association test varied between 0.99 for asthma and 1.50 for height, closely resembling the SNP GWAS (Pearson Correlation Coefficient [PCC]=0.91, paired t-test p=0.16; Method and Figure S3). We concluded that HFS-based association tests had adequate power and do not introduce additional p-value inflation.”

      (2) A clear advantage of performing HFS-trait associations is that the HFS score is imputed by considering variants across the allele frequency spectrum. However, no evidence is provided demonstrating that rare variants contribute to associations derived by the model. Similarly, do the authors find evidence that allelic heterogeneity is leveraged by the HFS-based association model? It would be useful to do simulations here to characterize the model behavior in the presence of trait-associated rare variants.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all rare (MAF<0.01) variants and repeated the HFS analysis (HFScommon) on chromosome 1. In linear association analysis, we found that 10.6% of HFS signals (p<5×10-8) were missed by HFScommon. In fine-mapping, 55.3% of HFS causal signals (PIP>0.95) were missed by HFScommon. We concluded that rare variants played an important role in the performance of HFS, especially its advantages in fine-mapping.

      Line 175: “We also found that rare variants played an important role in the good find-mapping performance of HFS: when variants with MAF<0.01 were removed, 55.3% of the causal signals would be missed in HFS+SUSIE analysis.”

      We then attempted to conduct a simulation analysis where rare variants were causal to the phenotype, and the association statistics were the same as real GWAS of height. However, such simulation seemed not to properly reflect real scenario: no matter how we changed the association between rare variants and the phenotype, HFS association p-value could hardly reached the significance level of SNP association. We proposed that this is because simulation could not properly reflect how variants impact functional genomics: in fact, when randomly selected a rare variant as causal variant, there is high possibility that it had no impact on functional genomics, therefore its HFS would be close to zero. When such a variant was set as causal (which is unlikely in real scenario), HFS would not properly capture the association. We reasoned that it might be difficult to evaluate HFS by simulation, since the nonlinear relation between SNP and HFS as well as among SNPs were difficult to be properly simulated.

      Author response image 5.

      (3) Sei predicts chromatin status / ChIP-seq peaks in the center of a 4kb region. It would therefore be more relevant to predict HFS using overlapping sequence windows that tile the genome as opposed to using non-overlapping windows for computing HFS scores. Specifically, in line 482, the authors state that "the HFS score represents overall activity of the entire sequence, not only the few bp at the center", but this would not hold given that Sei is predicting activity at the center for any sequence.

      Response: We thank the reviewer for the suggestion on sliding window design. In this revision, we shifted all loci 2,048 bp to double the number of loci and repeated the fine-mapping and polygenic prediction analysis. For fine-mapping, we found that the result was generally robust with regard to sliding window procedure, and the majority of the causal associations were retained:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction, sliding window analysis provided a significantly improved performance compared with previous analysis on non-overlapping loci:

      However, since in this revision we have several updates on the polygenic prediction procedure, it was difficult to quantify how much improvement was led by sliding window design. Thus, we directly showed the new result in figure 5 but did not compare it with the original result.

      We also modified the previously imprecise statement to:

      Line 490: “…it integrated information of the entire sequence, not only the few bp at the center.”

      (4) Is the HFS-based association going to miss coding variation and several regulatory variants such as splicing variants? There are also going to be cases where there's an association driven by a variant that is correlated with a Sei prediction in a neighboring window. These would represent false positives for the method, it would be useful to identify or characterize these cases.

      Response: As the reviewer suggested, sei captured only functional genomic features and is by nature prone not to perform well when the causal variants impact protein sequences. In this revision, we characterized this by focusing on causal exonic variants (SNP PIP>0.95):

      Line 322: “On the other hand, HFS perform worse than SNP-based fine-mapping on exonic regions. Taking height as an example, PolyFun detected 125 causal SNPs (PIP>0.95) in the exonic regions, but only 16% (20) of loci that harbored them also reached PIP>0. 5 (11 reached PIP>0.95) in HFS+SUSIE analysis. Among the 105 loci that missed such signals (HFS PIP<0.5), 12 had a nearby locus (within 10kb) showing HFS PIP>0.95, which likely reflected false positive led by LD. Thus, SNP-based analysis should be prioritized over HFS in coding regions.”

      Additional minor concerns:

      (1) It's not clear whether SuSie-based finemapping is appropriate at the locus level, when there is limited LD between neighboring HFS bins. How does the choice of the number of causal loci and the size of the segment being finemapped affect the results and is SuSie a good fit in this scenario?

      Response: Following the reviewer’s suggestion, we reran SUSIE under different predefined causal loci number (from 2 to 10), and found that the identified causal loci were consistent.

      Author response image 6.

      Line 211: “Besides, HFS+SUSIE was also robust when the predefined number of causal loci (L=2 to 10) was changed, and the number of detected loci were not changed.”

      As for the size of segmentation, we divided the predefined segmentations (independent blocks detected by LDetect) into two half and reran SUSIE, and found that three additional causal loci emerged in one half. This suggested that using too small segmentation might increase the false positive rate. However, since there is no LD between independent blocks (which was guaranteed by LDetect), it is not necessary to use even longer blocks.

      Author response image 7.

      Line 133: “Simulation analysis revealed that when a non-reference sequence class score was associated the trait, reference class score could still capture median 70% of HFS-trait association R2.”

      (2) It is not clear how a single score is chosen from the 117 values predicted by Sei for each locus. SuSie is run assuming a single causal signal per locus, an assumption which may not hold at ~4kb resolution (several classes could be associated with the trait of interest). It's not clear whether SuSie, run in this parameter setting, is a good choice for variable selection here.

      Response: As we discussed below (question 3), in this revision we no longer applied SUSIE to find one sequence class score for each locus due to the impact of overfitting, and use the reference sequence class uniformly for all loci. As reviewer suggested, we applied simulation to evaluate how this procedure influence HFS performance, especially when multiple sequence class of the same locus is causal to the phenotype. We found that reference sequence class score could capture median 69.1% of phenotypic R2 when the causal sequence class is not the reference, and captured median 59.2% of R2 when there was 2~5 non-reference causal class. We concluded that the loss led by skipping sequence class selection is mild, and it is necessary to do so in consideration of the risk of overfitting.

      Author response image 8.

      (3) A single HFS score is being chosen from amongst multiple tracks at each locus independently. Does this require additional multiple-hypothesis correction?

      Response: We agree with the reviewer that choosing the sequence class for each locus represented multiple testing, and with additional experiments we indeed observed some evidences of overfitting of this procedure. Thus, in this revision, we no longer applied the per-locus feature selection procedure, but instead used the sequence class corresponded to the reference (hg38) sequence. Consequently, additional multiple-testing correction is avoided with this procedure. We admitted that such simplification missed certain information, but as mentioned above, such lost is moderate, and is necessary to ensure statistical robustness and reduce false positive. In fact, with such simplification we better controlled the inflation factor of HFS GWAS and got better portability in polygenic prediction.

      (4) The results show that a larger number of loci are identified with HFS-based finemapping & that causal loci are enriched for causal SNPs. However, it is not clear how the number of causal loci should relate to the number of SNPs. It would be really nice to see examples of cases where a previously unresolved association is resolved when using HFS-based GWAS + finemapping.

      Response: In this revision, we did not observe a clear relation between causal loci number and causal gene number. The only trend is that SNP-based fine-mapping seemed to perform better at coding regions, in accordance with the fact that HFS capture functional genomic signals. We also added new interpretations to highlight some examples where HFS resolve previously unresolved association signals. For example,

      Line 287: “Specifically, in 1q32.1 region, HFS+SUSIE identified two loci with PIP>0.9 (Figure 4B). SNP-based association also found significant association in this region, but SNP fine-mapping(Weissbrod et al., 2020) could not resolve this signal and only found seven signals between PIP=0.1 to 0.5.”

      (5) Sequence-based deep learning model predictions can be miscalibrated for insertions and deletions (INDELs) as compared to SNPs. Scaling INDEL predictions would likely improve the downstream modeling.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all indel on chromosome 1 and repeated HFS analysis. Removing indel has indeed increased the number of significant (p<5e-8) association by 9%, but also slightly increased inflation factor (paired wilcox test p=0.0001). In fine mapping analysis, removing indel caused a 4.7% decrement in the number of detected causal association (PIP>0.95). We reasoned that the potential miscalibration on indel has indeed impacted the statistical power of HFS, but the proper approach to control this impact might not be direct and is still await optimizing. In this revision, we still kept all indels in the analysis, since we proposed that the power of fine-mapping is more important than the power of marginal association.

      Line 213: “Lastly, removing insertion and deletion would reveal 9% more significant association (p<5×10-8) but 4.7% less causal association (PIP>0.95), and slightly increased inflation factor (Wilcoxon p=0.0001, Figure S4).”

      Author response image 9.

      Reviewer #1 (Recommendations For The Authors):

      It was unclear to me why the sei output was rounded to two decimal places to "avoid influence of sei prediction noise". Wouldn't rounding introduce additional noise?

      Response: We thank the reviewer for pointing out our inadequate description. The rounding procedure is used to mask the low value that likely did not reflect any real change. The idea is that, even if a variant actually does not bring about any functional changes, sei would still output a very low HFS value that is not equal to, but close to, zero. By rounding procedure, such low values would be set to zero, which could avoid noise. We have added this rationale to the method section:

      Line 529: “This is due to the fact that even if a variant actually makes no impact on functional genomics, sei would still output a value that are close to but not equal to reference sequence class score. Rounding procedure would set such HFS to zero and remove the random value from sei.”

      Minor comments / typos:

      • There are many typos in the abstract.

      Response: We have revised the typo and grammar issues in the abstract in this revision.

      • I believe "Arachnoid acid-intelligence" should be "Arachidonic acid-intelligence".

      • Consistently there is no space between text and parenthetical citations. For example, "sei(Chen et al., 2022)" should be "sei (Chen et al., 2022)".

      • Line 110: "at least one non-reference haplotypes" --> "at least one non-reference haplotype".

      • Line 155: "data-based method" --> "data-based methods".

      • Lines 165-166: "functionally importance" --> "functional importance".

      Response: We have made these revisions accordingly.

      • Line 210: the sentence containing "this annotation on conditioned of a set of baseline annotations" is unclear.

      Response: We have revised this sentence as “…regressed the PIP against this annotation, with a set of baseline annotations included as covariates, similar to the LDSC framework.”

      • Line 213: "association" --> "associations".

      • Line 219: "association" --> "associations".

      • Line 251: "result" --> "results".

      • Line 269: "result" --> "results".

      • Line 289: "known to involved" --> "known to be involved".

      • Line 356: "LDAK along" --> "LDAK alone".

      • Line 362: "BOLT-LMM along" --> "BOLT-LMM alone".

      • Supplement: "Hihglighted" --> "Highlighted".

      Response: We have made these revisions accordingly.

      • Line 444: Were "British ancestry Caucasians" defined as individuals that self-identified as "white British"? If so, then they should be described as "self-identified "white British"".

      Response: As the reviewer pointed out, we have changed the description as self-identified British ancestry Caucasians.

      Reviewer #2 (Recommendations For The Authors):

      (1) A 2022 cistrome-wide association study (CWAS) computed associations between genetically-predicted chromatin activity and phenotypes. Adding a reference to this paper would be helpful. https://pubmed.ncbi.nlm.nih.gov/36071171/

      Response: Following the reviewer’s suggestion, we discussed the similarity between CWAS and our study:

      Line 89: “In line with this notion, a recent similar strategy called cistrome-wide association study (CWAS) integrated variant-chromatin activity and variant-phenotype association to boost power of genetic study of cancer. (Baca et al., 2022).”

      (2) Line 487 states: "We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark." It's not clear what normalization is being referred to here.

      Response: We have revised the sentence to:

      Line 495: “We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark (divided each track score by the sum of histone mark score) as suggested by the sei author.”

      (3) The figures are extremely low resolution, they need to be updated.

      Response: In this revision, we uploaded separate pdf file for each figure to provide high resolution graphs.

      (4). The results section was difficult to follow and would benefit from being written more clearly.

      Response: In this revision, we re-arranged some of the result section to better clarify the main idea. We moved all statistical results to the bracket and focused our main text on the interpretation. For example,

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      (5) "Along" is used several times in the final results section (PRS estimation), this should be "alone".

      Response: We have modified all misused “along” by “alone” in this revision.

      (6) Instead of using notation identifying genomic location, it might be clearer to provide gene names when illustrating examples of trait-associated promoters.

      Response: In this revision, we added gene name of the corresponding promoters to the main text to better clarify the findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Comments

      Reviewer 1

      (1) Despite the well-established role of Netrin-1 and UNC5C axon guidance during embryonic commissural axons, it remains unclear which cell type(s) express Netrin-1 or UNC5C in the dopaminergic axons and their targets. For instance, the data in Figure 1F-G and Figure 2 are quite confusing. Does Netrin-1 or UNC5C express in all cell types or only dopamine-positive neurons in these two mouse models? It will also be important to provide quantitative assessments of UNC5C expression in dopaminergic axons at different ages.

      Netrin-1 is a secreted protein and in this manuscript we did not examine what cell types express Netrin-1. This question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present. As per the reviewer’s request we include below images showing Netrin-1 protein and Netrin-1 mRNA expression in the forebrain. In Figure 1 below, we show a high magnification immunofluorescent image of a coronal forebrain section showing Netrin-1 protein expression.

      Author response image 1.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      In Figures 2 and 3 below we show low and high magnification images from an RNAscope experiment confirming that cells in the forebrain regions examined express Netrin-1 mRNA.

      Author response image 2.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, fmi: forceps minor of the corpus callosum, IL: Infralimbic Cortex, PrL: Prelimbic Cortex

      Author response image 3.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      Regarding UNC5c, this receptor homologue is expressed by dopamine neurons in the rodent ventral tegmental area (Daubaras et al., 2014; Manitt et al., 2010; Phillips et al., 2022). This does not preclude UNC5c expression in other cell types. UNC5c receptors are ubiquitously expressed in the brain throughout development, performing many different developmental functions (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). In this study we are interested in UNC5c expression by dopamine neurons, and particularly by their axons projecting to the nucleus accumbens. We therefore used immunofluorescent staining in the nucleus accumbens, showing UNC5 expression in TH+ axons. This work adds to the study by Manitt et al., 2010, which examined UNC5 expression in the VTA. Manitt et al. used Western blotting to demonstrate that UNC5 expression in VTA dopamine neurons increases during adolescence, as can be seen in the following figure:

       References:
      

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.20110.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (2) Figure 1 used shRNA to knockdown Netrin-1 in the Septum and these mice were subjected to behavioral testing. These results, again, are not supported by any valid data that the knockdown approach actually worked in dopaminergic axons. It is also unclear whether knocking down Netrin-1 in the septum will re-route dopaminergic axons or lead to cell death in the dopaminergic neurons in the substantia nigra pars compacta?

      First we want to clarify and emphasize, that our knockdown approach was not designed to knock down Netrin-1 in dopamine neurons or their axons. Our goal was to knock down Netrin-1 expression in cells expressing this guidance cue gene in the dorsal peduncular cortex.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      We agree that our experiments do not address the fate of the dopamine axons that are misrouted away from the medial prefrontal cortex. This research is ongoing, and we have now added a note regarding this to our manuscript.

      Our current hypothesis, based on experiments being conducted as part of another line of research in the lab, is that these axons are rerouted to a different brain region which they then ectopically innervate. In these experiments we are finding that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (3) Another issue with Figure1J. It is unclear whether the viruses were injected into a WT mouse model or into a Cre-mouse model driven by a promoter specifically expresses in dorsal peduncular cortex? The authors should provide evidence that Netrin-1 mRNA and proteins are indeed significantly reduced. The authors should address the anatomic results of the area of virus diffusion to confirm the virus specifically infected the cells in dorsal peduncular cortex.

      All the virus knockdown experiments were conducted in wild type mice, we added this information to Figure 1k.

      The efficacy of the shRNA in knocking down Netrin-1 was demonstrated by Cuesta et al. (2020) both in vitro and in vivo, as we show in our response to the reviewer’s previous comment above.

      We also now provide anatomical images demonstrating the localization of the injection and area of virus diffusion in the mouse forebrain. In Author response image 4 below the area of virus diffusion is visible as green fluorescent signal.

      Author response image 4.

      Fluorescent microscopy image of a mouse forebrain demonstrating the localization of the injection of a virus to knock down Netrin-1. The location of the virus is in green, while cell nuclei are in blue (DAPI). Abbreviations: DP: dorsopeduncular cortex IL: infralimbic cortex

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (4) The authors need to provide information regarding the efficiency and duration of knocking down. For instance, in Figure 1K, the mice were tested after 53 days post injection, can the virus activity in the brain last for such a long time?

      In our study we are interested in the role of Netrin-1 expression in the guidance of dopamine axons from the nucleus accumbens to the medial prefrontal cortex. The critical window for these axons leaving the nucleus accumbens and growing to the cortex is early adolescence (Reynolds et al., 2018b). This is why we injected the virus at the onset of adolescence, at postnatal day 21. As dopamine axons grow from the nucleus accumbens to the prefrontal cortex, they pass through the dorsal peduncular cortex. We disrupted Netrin-1 expression at this point along their route to determine whether it is the Netrin-1 present along their route that guides these axons to the prefrontal cortex. We hypothesized that the shRNA Netrin-1 virus would disrupt the growth of the dopamine axons, reducing the number of axons that reach the prefrontal cortex and therefore the number of axons that innervate this region in adulthood.

      We conducted our behavioural tests during adulthood, after the critical window during which dopamine axon growth occurs, so as to observe the enduring behavioral consequences of this misrouting. This experimental approach is designed for the shRNa Netrin-1 virus to be expressed in cells in the dorsopeduncular cortex when the dopamine axons are growing, during adolescence.

       References:
      

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      (5) In Figure 1N-Q, silencing Netrin-1 results in less DA axons targeting to infralimbic cortex, but why the Netrin-1 knocking down mice revealed the improved behavior?

      This is indeed an intriguing finding, and we have now added a mention of it to our manuscript. We have demonstrated that misrouting dopamine axons away from the medial prefrontal cortex during adolescence alters behaviour, but why this improves their action impulsivity ability is something currently unknown to us. One potential answer is that the dopamine axons are misrouted to a different brain region that is also involved in controlling impulsive behaviour, perhaps the dorsal striatum (Kim and Im, 2019) or the orbital prefrontal cortex (Jonker et al., 2015).

      We would also like to note that we are finding that other manipulations that appear to reroute dopamine axons to unintended targets can lead to reduced action impulsivity as measured using the Go No Go task. As we mentioned above, current experiments in the lab, which are part of a different line of research, are showing that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood, but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro2014-0043 Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      (6) What is the effect of knocking down UNC5C on dopamine axons guidance to the cortex?

      We have found that mice that are heterozygous for a nonsense Unc5c mutation, and as a result have reduced levels of UNC5c protein, show reduced amphetamine-induced locomotion and stereotypy (Auger et al., 2013). In the same manuscript we show that this effect only emerges during adolescence, in concert with the growth of dopamine axons to the prefrontal cortex. This is indirect but strong evidence that UNC5c receptors are necessary for correct adolescent dopamine axon development.

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (7) In Figures 2-4, the authors only showed the amount of DA axons and UNC5C in NAcc. However, it remains unclear whether these experiments also impact the projections of dopaminergic axons to other brain regions, critical for the behavioral phenotypes. What about other brain regions such as prefrontal cortex? Do the projection of DA axons and UNC5c level in cortex have similar pattern to those in NAcc?

      UNC5c receptors are expressed throughout development and are involved in many developmental processes (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). We cannot say whether the pattern we observe here is unique to the nucleus accumbens, but it is certainly not universal throughout the brain.

      The brain region we focus on in our manuscript, in addition to the nucleus accumbens, is the medial prefrontal cortex. Close and thorough examination of the prefrontal cortices of adult mice revealed practically no UNC5c expression by dopamine axons. However, we did observe very rare cases of dopamine axons expressing UNC5c. It is not clear whether these rare cases are present before or during adolescence.

      Below is a representative set of images of this observation, which is now also included as Supplementary Figure 4:

      Author response image 5.

      Expression of UNC5c protein in the medial prefrontal cortex of an adult male mouse. Low (A) and high (B) magnification images demonstrate that there is little UNC5c expression in dopamine axons in the medial prefrontal cortex. Here we identify dopamine axons by immunofluorescent staining for tyrosine hydroxylase (TH, see our response to comment #9 regarding the specificity of the TH antibody for dopamine axons in the prefrontal cortex). This figure is also included as Supplementary Figure 4 in the manuscript. Abbreviations: fmi: forceps minor of the corpus callosum, mPFC: medial prefrontal cortex.

      References:

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254- 10.20110.2011

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (8) Can overexpression of UNC5c or Netrin-1 in male winter hamsters mimic the observations in summer hamsters? Or overexpression of UNC5c in female summer hamsters to mimic the winter hamster? This would be helpful to confirm the causal role of UNC5C in guiding DA axons during adolescence.

      This is an excellent question. We are very interested in both increasing and decreasing UNC5c expression in hamster dopamine axons to see if we can directly manipulate summer hamsters into winter hamsters and vice versa. We are currently exploring virus-based approaches to design these experiments and are excited for results in this area.

      (9) The entire study relied on using tyrosine hydroxylase (TH) as a marker for dopaminergic axons. However, the expression of TH (either by IHC or IF) can be influenced by other environmental factors, that could alter the expression of TH at the cellular level.

      This is an excellent point that we now carefully address in our methods by adding the following:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      Furthermore, we are not aware of any other processes in the forebrain that are known to be immunopositive for TH under any environmental conditions.

      To reduce confusion, we have replaced the abbreviation for dopamine – DA – with TH in the relevant panels in Figures 1, 2, 3, and 4 to clarify exactly what is represented in these images. As can be seen in these images, fluorescent green labelling is present only in axons, which is to be expected of dopamine labelling in these forebrain regions.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (10) Are Netrin-1/UNC5C the only signal guiding dopamine axon during adolescence? Are there other neuronal circuits involved in this process?

      Our intention for this study was to examine the role of Netrin-1 and its receptor UNC5C specifically, but we do not suggest that they are the only molecules to play a role. The process of guiding growing dopamine axons during adolescence is likely complex and we expect other guidance mechanisms to also be involved. From our previous work we know that the Netrin-1 receptor DCC is critical in this process (Hoops and Flores, 2017; Reynolds et al., 2023). Several other molecules have been identified in Netrin-1/DCC signaling processes that control corpus callosum development and there is every possibility that the same or similar molecules may be important in guiding dopamine axons (Schlienger et al., 2023).

      References:

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      (11) Finally, despite the authors' claim that the dopaminergic axon project is sensitive to the duration of daylight in the hamster, they never provided definitive evidence to support this hypothesis.

      By “definitive evidence” we think that the reviewer is requesting a single statistical model including measures from both the summer and winter groups. Such a model would provide a probability estimate of whether dopamine axon growth is sensitive to daylight duration. Therefore, we ran these models, one for male hamsters and one for female hamsters.

      In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      Reviewer 3

      (1) Fig 1 A and B don't appear to be the same section level.

      The reviewer is correct that Fig 1B is anterior to Fig 1A. We have changed Figure 1A to match the section level of Figure 1B.

      (2) Fig 1C. It is not clear that these axons are crossing from the shell of the NAC.

      We have added a dashed line to Figure 1C to highlight the boundary of the nucleus accumbens, which hopefully emphasizes that there are fibres crossing the boundary. We also include here an enlarged image of this panel:

      Author response image 6.

      An enlarged image of Figure1c in the manuscript. The nucleus accumbens (left of the dotted line) is densely packed with TH+ axons (in green). Some of these TH+ axons can be observed extending from the nucleus accumbens medially towards a region containing dorsally oriented TH+ fibres (white arrows).

      (3) Fig 1. Measuring width of the bundle is an odd way to measure DA axon numbers. First the width could be changing during adult for various reasons including change in brain size. Second, I wouldn't consider these axons in a traditional bundle. Third, could DA axon counts be provided, rather than these proxy measures.

      With regards to potential changes in brain size, we agree that this could have potentially explained the increased width of the dopamine axon pathway. That is why it was important for us to use stereology to measure the density of dopamine axons within the pathway. If the width increased but no new axons grew along the pathway, we would have seen a decrease in axon density from adolescence to adulthood. Instead, our results show that the density of axons remained constant.

      We agree with the reviewer that the dopamine axons do not form a traditional “bundle”. Therefore, throughout the manuscript we now avoid using the term bundle.

      Although we cannot count every single axon, an accurate estimate of this number can be obtained using stereology, an unbiassed method for efficiently quantifying large, irregularly distributed objects. We used stereology to count TH+ axons in an unbiased subset of the total area occupied by these axons. Unbiased stereology is the gold-standard technique for estimating populations of anatomical objects, such as axons, that are so numerous that it would be impractical or impossible to measure every single one. Here and elsewhere we generally provide results as densities and areas of occupancy (Reynolds et al., 2022). To avoid confusion, we now clarify that we are counting the width of the area that dopamine axons occupy (rather than the dopamine axon “bundle”).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (4) TH in the cortex could also be of noradrenergic origin. This needs to be ruled out to score DA axons

      This is the same comment as Reviewer 1 #9. Please see our response below, which we have also added to our methods:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (5) Netrin staining should be provided with NeuN + DAPI; its not clear these are all cell bodies. An in situ of Netrin would help as well.

      A similar comment was raised by Reviewer 1 in point #1. Please see below the immunofluorescent and RNA scope images showing expression of Netrin-1 protein and mRNA in the forebrain.

      Author response image 7.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      Author response image 8.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). RNAscope was used to generate this image. Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, IL: Infralimbic Cortex, PrL: Prelimbic Cortex, fmi: forceps minor of the corpus callosum

      Author response image 9.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      (6) The Netrin knockdown needs validation. How strong was the knockdown etc?

      This comment was also raised by Reviewer 1 #1.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (7) If the conclusion that knocking down Netrin in cortex decreases DA innervation of the IL, how can that be reconciled with Netrin-Unc repulsion.

      This is an intriguing question and one that we are in the planning stages of addressing with new experiments.

      Although we do not have a mechanistic answered for how a repulsive receptor helps guide these axons, we would like to note that previous indirect evidence from a study by our group also suggests that reducing UNC5c signaling in dopamine axons in adolescence increases dopamine innervation to the prefrontal cortex (Auger et al, 2013).

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (8) The behavioral phenotype in Fig 1 is interesting, but its not clear if its related to DA axons/signaling. IN general, no evidence in this paper is provided for the role of DA in the adolescent behaviors described.

      We agree with the reviewer that the behaviours we describe in adult mice are complex and are likely to involve several neurotransmitter systems. However, there is ample evidence for the role of dopamine signaling in cognitive control behaviours (Bari and Robbins, 2013; Eagle et al., 2008; Ott et al., 2023) and our published work has shown that alterations in the growth of dopamine axons to the prefrontal cortex leads to changes in impulse control as measured via the Go/No-Go task in adulthood (Reynolds et al., 2023, 2018a; Vassilev et al., 2021).

      The other adolescent behaviour we examined was risk-like taking behaviour in male and female hamsters (Figures 4 and 5), as a means of characterizing maturation in this behavior over time. We decided not to use the Go/No-Go task because as far as we know, this has never been employed in Siberian Hamsters and it will be difficult to implement. Instead, we chose the light/dark box paradigm, which requires no training and is ideal for charting behavioural changes over short time periods. Indeed, risk-like taking behavior in rodents and in humans changes from adolescence to adulthood paralleling changes in prefrontal cortex development, including the gradual input of dopamine axons to this region.

      References:

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: cross-species translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439–456. doi:10.1007/s00213-008-1127-6

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      (9) Fig2 - boxes should be drawn on the NAc diagram to indicate sampled regions. Some quantification of Unc5c would be useful. Also, some validation of the Unc5c antibody would be nice.

      The images presented were taken medial to the anterior commissure and we have edited Figure 2 to show this. However, we did not notice any intra-accumbens variation, including between the core and the shell. Therefore, the images are representative of what was observed throughout the entire nucleus accumbens.

      To quantify UNC5c in the accumbens we conducted a Western blot experiment in male mice at different ages. A one-way ANOVA analyzing band intensity (relative to the 15-day-old average band intensity) as the response variable and age as the predictor variable showed a significant effect of age (F=5.615, p=0.01). Posthoc analysis revealed that 15-day-old mice have less UNC5c in the nucleus accumbens compared to 21- and 35-day-old mice.

      Author response image 10.

      The graph depicts the results of a Western blot experiment of UNC5c protein levels in the nucleus accumbens of male mice at postnatal days 15, 21 or 35 and reveals a significant increase in protein levels at the onset adolescence.

      Our methods for this Western blot were as follows: Samples were prepared as previously (Torres-Berrío et al., 2017). Briefly, mice were sacrificed by live decapitation and brains were flash frozen in heptane on dry ice for 10 seconds. Frozen brains were mounted in a cryomicrotome and two 500um sections were collected for the nucleus accumbens, corresponding to plates 14 and 18 of the Paxinos mouse brain atlas. Two tissue core samples were collected per section, one for each side of the brain, using a 15-gauge tissue corer (Fine surgical tools Cat no. NC9128328) and ejected in a microtube on dry ice. The tissue samples were homogenized in 100ul of standard radioimmunoprecipitation assay buffer using a handheld electric tissue homogenizer. The samples were clarified by centrifugation at 4C at a speed of 15000g for 30 minutes. Protein concentration was quantified using a bicinchoninic acid assay kit (Pierce BCA protein assay kit, Cat no.PI23225) and denatured with standard Laemmli buffer for 5 minutes at 70C. 10ug of protein per sample was loaded and run by SDS-PAGE gel electrophoresis in a Mini-PROTEAN system (Bio-Rad) on an 8% acrylamide gel by stacking for 30 minutes at 60V and resolving for 1.5 hours at 130V. The proteins were transferred to a nitrocellulose membrane for 1 hour at 100V in standard transfer buffer on ice. The membranes were blocked using 5% bovine serum albumin dissolved in tris-buffered saline with Tween 20 and probed with primary (UNC5c, Abcam Cat. no ab302924) and HRP-conjugated secondary antibodies for 1 hour. a-tubulin was probed and used as loading control. The probed membranes were resolved using SuperSignal West Pico PLUS chemiluminescent substrate (ThermoFisher Cat no.34579) in a ChemiDoc MP Imaging system (Bio-Rad). Band intensity was quantified using the ChemiDoc software and all ages were normalized to the P15 age group average.

      Validation of the UNC5c antibody was performed in the lab of Dr. Liu, from whom it was kindly provided. Briefly, in the validation study the authors showed that the anti-UNC5C antibody can detect endogenous UNC5C expression and the level of UNC5C is dramatically reduced after UNC5C knockdown. The antibody can also detect the tagged-UNC5C protein in several cell lines, which was confirmed by a tag antibody (Purohit et al., 2012; Shao et al., 2017).

      References:

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      (10) "In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, and reduction in UNC5C expression appears to cause growth of mesolimbic dopamine axons to the prefrontal cortex".....This is confusing. Figure 2 shows a developmental increase in UNc5c not a decrease. So when is the "reduction in Unc5c expression" occurring?

      We apologize for the mistake in this sentence. We have corrected the relevant passage in our manuscript as follows:

      In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, particularly when mesolimbic and mesocortical dopamine projections segregate in the nucleus accumbens (Manitt et al., 2010; Reynolds et al., 2018a). In contrast, dopamine axons in the prefrontal cortex do not express UNC5c except in very rare cases (Supplementary Figure 4). In adult male mice with Unc5c haploinsufficiency, there appears to be ectopic growth of mesolimbic dopamine axons to the prefrontal cortex (Auger et al., 2013). This miswiring is associated with alterations in prefrontal cortex-dependent behaviours (Auger et al., 2013).

      References:

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      (11) In Fig 3, a statistical comparison should be made between summer male and winter male, to justify the conclusions that the winter males have delayed DA innervation.

      This analysis was also suggested by Reviewer 1, #11. Here is our response:

      We analyzed the summer and winter data together in ANOVAs separately for males and females. In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      (12) Should axon length also be measured here (Fig 3)? It is not clear why the authors have switched to varicosity density. Also, a box should be drawn in the NAC cartoon to indicate the region that was sampled.

      It is untenable to quantify axon length in the prefrontal cortex as we cannot distinguish independent axons. Rather, they are “tangled”; they twist and turn in a multitude of directions as they make contact with various dendrites. Furthermore, they branch extensively. It would therefore be impossible to accurately quantify the number of axons. Using unbiased stereology to quantify varicosities is a valid, well-characterized and straightforward alternative (Reynolds et al., 2022).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (13) In Fig 3, Unc5c should be quantified to bolster the interesting finding that Unc5c expression dynamics are different between summer and winter hamsters. Unc5c mRNA experiments would also be important to see if similar changes are observed at the transcript level.

      We agree that it would be very interesting to see how UNC5c mRNA and protein levels change over time in summer and winter hamsters, both in males, as the reviewer suggests here, and in females. We are working on conducting these experiments in hamsters as part of a broader expansion of our research in this area. These experiments will require a lengthy amount of time and at this point we feel that they are beyond the scope of this manuscript.

      (14) Fig 4. The peak in exploratory behavior in winter females is counterintuitive and needs to be better discussed. IN general, the light dark behavior seems quite variable.

      This is indeed a very interesting finding, which we have expanded upon in our manuscript as follows:

      When raised under a winter-mimicking daylength, hamsters of either sex show a protracted peak in risk taking. In males, it is delayed beyond 80 days old, but the delay is substantially less in females. This is a counterintuitive finding considering that dopamine development in winter females appears to be accelerated. Our interpretation of this finding is that the timing of the risk-taking peak in females may reflect a balance between different adolescent developmental processes. The fact that dopamine axon growth is accelerated does not imply that all adolescent maturational processes are accelerated. Some may be delayed, for example those that induce axon pruning in the cortex. The timing of the risk-taking peak in winter female hamsters may therefore reflect the amalgamation of developmental processes that are advanced with those that are delayed – producing a behavioural effect that is timed somewhere in the middle. Disentangling the effects of different developmental processes on behaviour will require further experiments in hamsters, including the direct manipulation of dopamine activity in the nucleus accumbens and prefrontal cortex.

      Full Reference List

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: crossspecies translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439– 456. doi:10.1007/s00213-008-1127-6

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro-2014-0043

      Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1-mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      Torres-Berrío A, Lopez JP, Bagot RC, Nouel D, Dal-Bo G, Cuesta S, Zhu L, Manitt C, Eng C, Cooper HM, Storch K-F, Turecki G, Nestler EJ, Flores C. 2017. DCC Confers Susceptibility to Depression-like Behaviors in Humans and Mice and Is Regulated by miR-218. Biological psychiatry 81:306–315. doi:10.1016/j.biopsych.2016.08.017

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      Private Comments

      Reviewer #1

      (12) The language should be improved. Some expression is confusing (line178-179). Also some spelling errors (eg. Figure 1M).

      We have removed the word “Already” to make the sentence in lines 178-179 clearer, however we cannot find a spelling error in Figure 1M or its caption. We have further edited the manuscript for clarity and flow.

      Reviewer #2

      (1) The authors claim to have revealed how the 'timing of adolescence is programmed in the brain'. While their findings certainly shed light on molecular, circuit and behavioral processes that are unique to adolescence, their claim may be an overstatement. I suggest they refine this statement to discuss more specifically the processes they observed in the brain and animal behavior, rather than adolescence itself.

      We agree with the reviewer and have revised the manuscript to specify that we are referring to the timing of specific developmental processes that occur in the adolescent brain, not adolescence overall.

      (2) Along the same lines, the authors should also include a more substantiative discussion of how they selected their ages for investigation (for both mice and hamsters), For mice, their definition of adolescence (P21) is earlier than some (e.g. Spear L.P., Neurosci. and Beh. Reviews, 2000).

      There are certainly differences of opinion between researchers as to the precise definition of adolescence and the period it encompasses. Spear, 2000, provides one excellent discussion of the challenges related to identifying adolescence across species. This work gives specific ages only for rats, not mice (as we use here), and characterizes post-natal days 28-42 as being the conservative age range of “peak” adolescence (page 419, paragraph 1). Immediately thereafter the review states that the full adolescent period is longer than this, and it could encompass post-natal days 20-55 (page 419, paragraph 2).

      We have added the following statement to our methods:

      There is no universally accepted way to define the precise onset of adolescence. Therefore, there is no clear-cut boundary to define adolescent onset in rodents (Spear, 2000). Puberty can be more sharply defined, and puberty and adolescence overlap in time, but the terms are not interchangeable. Puberty is the onset of sexual maturation, while adolescence is a more diffuse period marked by the gradual transition from a juvenile state to independence. We, and others, suggest that adolescence in rodents spans from weaning (postnatal day 21) until adulthood, which we take to start on postnatal day 60 (Reynolds and Flores, 2021). We refer to “early adolescence” as the first two weeks postweaning (postnatal days 21-34). These ranges encompass discrete DA developmental periods (Kalsbeek et al., 1988; Manitt et al., 2011; Reynolds et al., 2018a), vulnerability to drug effects on DA circuitry (Hammerslag and Gulley, 2014; Reynolds et al., 2018a), and distinct behavioral characteristics (Adriani and Laviola, 2004; Makinodan et al., 2012; Schneider, 2013; Wheeler et al., 2013).

      References:

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette MP, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. Doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

      (3) Figure 1 - the conclusions hinge on the Netrin-1 staining, as shown in panel G, but the cells are difficult to see. It would be helpful to provide clearer, more zoomed images so readers can better assess the staining. Since Netrin-1 expression reduces dramatically after P4 and they had to use antigen retrieval to see signal, it would be helpful to show some images from additional brain regions and ages to see if expression levels follow predicted patterns. For instance, based on the allen brain atlas, it seems that around P21, there should be high levels of Netrin-1 in the cerebellum, but low levels in the cortex. These would be nice controls to demonstrate the specificity and sensitivity of the antibody in older tissue.

      We do not study the cerebellum and have never stained this region; doing so now would require generating additional tissue and we’re not sure it would add enough to the information provided to be worthwhile. Note that we have stained the forebrain for Netrin-1 previously, providing broad staining of many brain regions (Manitt et al., 2011)

      References:

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      (4) Figure 3 - Because mice tend to avoid brightly-lit spaces, the light/dark box is more commonly used as a measure of anxiety-like behavior than purely exploratory behavior (including in the paper they cited). It is important to address this possibility in their discussion of their findings. To bolster their conclusions about the coincidence of circuit and behavioral changes in adolescent hamsters, it would be useful to add an additional measure of exploratory behaviors (e.g. hole board).

      Regarding the light/dark box test, this is an excellent point. We prefer the term “risk taking” to “anxiety-like” and now use the former term in our manuscript. Furthermore, our interest in the behaviour is purely to chart the development of adolescent behaviour across our treatment groups, not to study a particular emotional state. Regardless of the specific emotion or emotions governing the light/dark box behaviour, it is an ideal test for charting adolescent shifts in behaviour as it is well-characterized in this respect, as we discuss in our manuscript.

      (5) Supplementary Figure 4,5 The authors defined puberty onset using uterine and testes weights in hamsters. While the weights appear to be different for summer and winter hamsters, there were no statistical comparison. Please add statistical analyses to bolster claims about puberty start times. Also, as many studies use vaginal opening to define puberty onset, it would be helpful to discuss how these measurements typically align and cite relevant literature that described use of uterine weights. Also, Supplementary Figures 4 and 5 were mis-cited as Supp. Fig. 2 in the text (e.g. line 317 and others).

      These are great suggestions. We have added statistical analyses to Supplementary Figures 5 and 6 and provided Vaginal Opening data as Supplementary Figure 7. The statistical analyses confirm that all three characters are delayed in winter hamsters compared to summer hamsters.

      We have also added the following references to the manuscript:

      Darrow JM, Davis FC, Elliott JA, Stetson MH, Turek FW, Menaker M. 1980. Influence of Photoperiod on Reproductive Development in the Golden Hamster. Biol Reprod 22:443–450. doi:10.1095/biolreprod22.3.443

      Ebling FJP. 1994. Photoperiodic Differences during Development in the Dwarf Hamsters Phodopus sungorus and Phodopus campbelli. Gen Comp Endocrinol 95:475–482. doi:10.1006/gcen.1994.1147

      Timonin ME, Place NJ, Wanderi E, Wynne-Edwards KE. 2006. Phodopus campbelli detect reduced photoperiod during development but, unlike Phodopus sungorus, retain functional reproductive physiology. Reproduction 132:661–670. doi:10.1530/rep.1.00019

      (6) The font in many figure panels is small and hard to read (e.g. 1A,D,E,H,I,L...). Please increase the size for legibility.

      We have increased the font size of our figure text throughout the manuscript.

      Reviewer #3

      (15) Fig 1 C,D. Clarify the units of the y axis

      We have now fixed this.

      Full Reference List

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625 Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the authors developed an image analysis pipeline to automacally idenfy individual neurons within a populaon of fluorescently tagged neurons. This applicaon is opmized to deal with mul-cell analysis and builds on a previous soware version, developed by the same team, to resolve individual neurons from whole-brain imaging stacks. Using advanced stascal approaches and several heuriscs tailored for C. elegans anatomy, the method successfully idenfies individual neurons with a fairly high accuracy. Thus, while specific to C. elegans, this method can become instrumental for a variety of research direcons such as in-vivo single-cell gene expression analysis and calcium-based neural acvity studies.

      Thank you.

      Reviewer #2 (Public Review):

      The authors succeed in generalizing the pre-alignment procedure for their cell idenficaon method to allow it to work effecvely on data with only small subsets of cells labeled. They convincingly show that their extension accurately idenfies head angle, based on finding auto florescent ssue and looking for a symmetric l/r axis. They demonstrate method works to allow the idenficaon of a parcular subset of neurons. Their approach should be a useful one for researchers wishing to idenfy subsets of head neurons in C. elegans, and the ideas might be useful elsewhere.

      The authors also assess the relave usefulness of several atlases for making identy predicons. They atempt to give some addional general insights on what makes a good atlas, but here insights seem less clear as available data does not allow for experiments that cleanly decouple: 1. the number of examples in the atlas 2. the completeness of the atlas. and 3. the match in strain and imaging modality discussed. In the presented experiments the custom atlas, besides the strain and imaging modality mismatches discussed is also the only complete atlas with more than one example. The neuroPAL atlas, is an imperfect stand in, since a significant fracon of cells could not be idenfied in these data sets, making it a 60/40 mix of Openworm and a hypothecal perfect neuroPAL comparison. This waters down general insights since it is unclear if the performance is driven by strain/imaging modality or these difficules creang a complete neuroPal atlas. The experiments do usefully explore the volume of data needed. Though generalizaon remains to be shown the insight is useful for future atlas building that for the specific (small) set of cells labeled in the experiments 5-10 examples is sufficient to build a accurate atlas.

      The reviewer brings up an interesting point. As the reviewer noted, given the imperfection of the datasets (ours and others’), it is possible that artifacts from incomplete atlases can interfere with the assessment of the performances of different atlases. To address this, as the reviewer suggested, we have searched the literature and found two sets of data that give specific coordinates of identified neurons (both using NeuroPAL). We compared the performance of the atlases derived from these datasets to the strain-specific atlases, and the original conclusion stands. Details are now included in the revised manuscript (Figure 3- figure supplement 2).

      Recommendaons for the authors:

      Reviewer #1 (Recommendaons For The Authors):

      I appreciate the new mosaic analysis (Fig. 3 -figure suppl 2). Please fix the y-axis ck label that I believe should be 0.8 (instead of 0.9).

      We thank the reviewer for spotting the typo. We have fixed the error.

      **Reviewer #2 (Recommendaons For The Authors):

      Though I'm not familiar with the exact quality of GT labels in available neuroPAL data I know increasing volumes of published data is available. Comparison with a complete neuroPAL atlas, and a similar assessment on atlas size as made with the custom atlas would to my mind qualitavely increase the general insights on atlas construcon.

      We thank the reviewer for the insightful suggestion. We have newly constructed several other NeuroPAL atlases by incorporating neuron positional data from two other published data: [Yemini E. et al. NeuroPAL: A Multicolor Atlas for Whole-Brain Neuronal Identification in C. elegans. Cell. 2021 Jan 7;184(1):272-288.e11] and [Skuhersky, M. et al. Toward a more accurate 3D atlas of C. elegans neurons. BMC Bioinformatics 23, 195 (2022)].

      Interestingly, we found that the two new atlases (NP-Yemini and NP-Skuhersky) have significantly different values of PA, LR, DV, and angle relationships for certain cells compared to the OpenWorm and glr-1 atlases. For example, in both the NP atlases, SMDD is labeled as being anterior to AIB, which is the opposite of the SMDD-AIB relationship in the glr-1 atlas.

      Because this relationship (and other similar cases) were missing in our original NeuroPAL atlas (NP-Chaudhary), the addition of these two NeuroPAL datasets to our NeuroPAL atlas dramatically changed the atlas. As a result, incorporating the published data sets into the NeuroPAL atlas (NP-all) actually decreased the average prediction accuracy to 44%, while the average accuracy of original NeuroPAL atlas (NP-Chaudhary) was 57%. The atlas based on the Yemini et al. data alone (NP-Yemini) had 43% accuracy, and the atlas based on the Skuhersky et al. data alone (NP-Skuhersky) had 38% accuracy.

      For the rest of our analysis, we focused on comparing the NeuroPAL atlas that resulted in the highest accuracy against other atlases in figure 3 (NP-Chaudhary). Therefore, we have added Figure 3- figure supplement 2 and the following sentence in the discussion. “Several other NeuroPAL atlases from different data sources were considered, and the atlas that resulted in the highest neuron ID correspondence was selected (Figure 3- figure supplement 2).”

      Author response image 1.

      Figure3- figure supplement 2. Comparison of neuron ID correspondences resulng from addional atlases- atlases driven from NeuroPAL neuron posional data from mulple sources (Chaudhary et al., Yemini et al., and Skuhersky et al.) in red compared to other atlases in Figure 3. Two sample t-tests were performed for stascal analysis. The asterisk symbol denotes a significance level of p<0.05, and n.s. denotes no significance. OW: atlas driven by data from OpenWorm project, NP-source: NeuroPAL atlas driven by data from the source. NP-Chaudhary atlas corresponds to NeuroPAL atlas in Figure 3.

      80% agreement among manual idenficaons seems low to me for a relavely small, (mostly) known set of cells, which seems to cast into doubt ground truth idenes based on a best 2 out of 3 vote. The authors menon 3% of cell idenes had total disagreement and were excluded, what were the fracon unanimous and 2/3? Are there any further insights about what limited human performance in the context of this parcular idenficaon task?

      We closely looked into the manual annotation data. The fraction of cells in unanimous, two thirds, and no agreement are approximately 74%, 20%, and 6%, respectively. We made the corresponding change in the manuscript from 3% to 6%. Indeed, we identified certain patterns in labels that were more likely to be disagreed upon. First, cells in close proximity to each other, such as AVE and RMD, were often switched from annotator to annotator. Second, cells in the posterior part of the cluster, such as RIM, AVD, AVB, were more variable in positions, so their identities were not clear at times. Third, annotators were more likely to disagree on cells whose expressions are rare and low, and these include AIB, AVJ, and M1. These observations agree with our results in figure 4c.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Yao et al. explored the transcriptomic characteristics of neural stem cells (NSCs) in the human hippocampus and their changes under different conditions using single-nucleus RNA sequencing (snRNA-seq). They generated single-nucleus transcriptomic profiles of human hippocampal cells from neonatal, adult, and aging individuals, as well as from stroke patients. They focused on the cell groups related to neurogenesis, such as neural stem cells and their progeny. They revealed genes enriched in different NSC states and performed trajectory analysis to trace the transitions among NSC states and towards astroglial and neuronal lineages in silico. They also examined how NSCs are affected by aging and injury using their datasets and found differences in NSC numbers and gene expression patterns across age groups and injury conditions. One major issue of the manuscript is questionable cell type identification. For example, more than 50% of the cells in the astroglial lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies.

      While the authors have made efforts to address previous critics, major concerns have not been adequately addressed, including a very limited sample size and with poor patient information. In addition, some analytical approaches are still questionable and the authors acknowledged that some they cannot address. Therefore, while the topic is interesting, some results are preliminary and some conclusions are not fully supported by the data presented.

      We thank the reviewer for reevaluating our revised manuscript. We respect the reviewer’s comments and discuss the technical and conceptual limitations of this work. Here we provide the response to Reviewer #1 (Public Review) on these below.

      Firstly, we appreciate the concerns raised by Reviewer 1 regarding the high proportion of NSCs within the astroglia lineage clusters. it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure 2-Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts.

      Secondly, we cannot adequately address the major concern regarding sample size raised by the reviewer due to the scarcity of stroke and neonatal human brain samples. We have collected additional details about the donors. Please refer to Figure 1-source data 1 for the updated information. Other information regarding the lifestyle parameters of these donors has not been sufficiently recorded by the hospital. Therefore, we cannot improve the patient information further.

      Thirdly, regarding the questionable subpopulations of granule cells (GCs) that derive from neuroblasts in Figure 4A-4D, which are inconsistent with previous single-cell transcriptomic studies, we tried various strategies to confirm the identity of the two subpopulations of granule cells (GCs) derived from neuroblasts but didn’t get a clear answer. As a result, we can only provide an objective description of the differences in gene expression and developmental trajectory and speculate that these differences may be related to their degree of maturity but are not aligned on the same trajectory.

      In the end, we have discussed the technical and conceptual limitations of this work and added a brief discussion about these limitations in the last paragraph of the main text. We hope the readers can interprate our data critically and objectively.

      Reviewer #2 (Public Review):

      In this manuscript, Yao et al. present a series of experiments aiming at generating a cellular atlas of the human hippocampus across aging, and how it may be affected by injury, in particular, stroke. Although the aim of the study is interesting and relevant for a larger audience, due to the ongoing controversy around the existence of adult hippocampal neurogenesis in humans, a number or technical weaknesses result in a poor support for many of the conclusions made from the results of these experiments.

      In particular, a recent meta analysis of five previous studies applying similar techniques to human samples has identified different aspects of sample size as main determinants of the statistical power needed to make significant conclusions. Some of this aspects are the number of nuclei sequenced and subject stratification. These two aspects are of concern in Yao's study. First, the number of sequenced nuclei is lower than the calculated numbers of nuclei required for detecting rare cell types. However, Yao et al. report succeeding in detecting rare populations, including several types of neural stem cells in different proliferation states, which have been demonstrated to be extremely scarce by previous studies. It would be very interesting to read how the authors interpret these differences. Secondly, the number of donors included in some of the groups is extremely low (n=1) and the miscellaneous information provided about the donors is practically inexistent. As individual factors such as chronic conditions, medication, lifestyle parameters, etc... are considered determinant for the variability of adult hippocampal neurogenesis levels across individuals, this represents a series limitation of the current study. Overall, several technical weaknesses severely limit the relevance of this study and the ability of the authors to achieve their experimental aims.

      After a first review round, the manuscript is still lacking a clear discussion of its several technical limitations, which will help the audience to grasp the relevance of the findings. In particular, detailed information about individual patients health status and relevant lifestyle parameters that may have affected it is lacking. The authors make the point themselves that the discrepancies among studies might be caused by health state differences across hippocampi, which subsequently lead to different degrees of hippocampal neurogenesis.". So, even in the authors own interpretation this is a serious limitation to the manuscript, that however out of the authors control, impacts on the quality of their findings.

      Reviewer #2 (Recommendations For The Authors):

      Please see public review. I do understand the authors point about incomplete patient data collection and low patient numbers and how the former is out of their control. Nevertheless, these are crucial parameters that impact negatively on the quality and relevance of several of their bold claims in the manuscript, especially given the low number of patients included. The current version still lacks a clear and honest discussion of the several technical and conceptual limitations of the authors work, as in some cases they are presented to the reviewers in the rebuttal letter, for the readership, so that they could critically evaluate the relevance of the authors' finding in a bigger perspective.

      We thank the reviewer for reevaluating our revised manuscript. We respect the reviewer’s comm¬ents and discuss the technical and conceptual limitations of this work. Here we provide the response to Reviewer #2 (Public Review) on these below.

      We understand the reviewer’s concern and have also noticed that according to the computational modeling conducted by Tosoni et al. (Neuron, 2023), at least 21 neuroblast cells (NBs) can be identified out of 30,000 granule cells (GCs) from a total of 180,000 dentate gyrus (DG) cells. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Therefore, it is possible that we are able to detect NBs. Importantly, we have implemented strict quality control measures to support the reliability of our sequencing data. These measures include: 1. Immediate collection of tissue samples after postmortem (3-4 hrs) to ensure the quality of isolated nuclei. 2. Only nuclei expressing more than 200 genes but fewer than 5000-8600 genes (depending on the peak of enrichment genes) were considered. On average, each cell detected around 3000 genes. 3. The average proportion of mitochondrial genes in each sample was approximately 1.8%, with no sample exceeding 5%. We have shown that the number of cells captured from individual samples and the average number of genes detected per cell are sufficient, indicating overall good sequencing quality (Figure 1-supplement 1A,B andF, and Figure 1-source data 1). Additionally, we have further confirmed the presence of these cell types with low abundance by integrating immunofluorescence staining (Figure 4E, 5D and 6B), cell type-specific gene expression (Figure1 C and D), overall transcriptomic characteristics (Figure 1-supplement 1E), and developmental potential (Figure4 A-D, Figure 6E and F). We hope these evidences together could explain why we can identify the rare neurogenic populations.

      Regarding the limited sample size and poor patient information, we cannot adequately address these two major concerns. Due to the scarcity of stroke or neonatal human samples, it was not feasible to collect a larger sample size within the expected timeframe. We have collected additional details about the donors. Please refer to Figure 1-source data 1 for the updated information. Other information regarding the lifestyle parameters of these donors has not been sufficiently recorded by the hospital. Therefore, we cannot improve the patient information further.

      As per the reviewer’s recommendation, in the latest version, we have discussed the technical and conceptual limitations of this work and added a brief discussion about these limitations in the last paragraph of the main text. We hope the readers can interprate our data critically and objectively.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their careful reading of our manuscript and for the detailed and constructive feedback on our work. Please find attached the revised version of the manuscript. We performed an extensive revision of the manuscript to address the issues raised by the referees. We provide new analyses (regarding the response consistency and the neural complexity), added supplementary figures and edits to figures and texts. Based on the reviewers’ comments, we introduced several major changes to the manuscript.

      Most notably, we

      • added a limitation statement to emphasize the speculative nature of our interpretation of the timing of word processing/associative binding

      • emphasized the limitations of the control condition

      • added analyses on the interaction between memory retrieval after 12h versus 36h

      • clarified our definition of episodic memory

      • added detailed analyses of the “Feeling of having heard” responses and the confidence ratings

      We hope that the revised manuscript addresses the reviewers' comments to their satisfaction. We believe that the revised manuscript has been significantly improved owing to the feedback provided. Below you can find a point-by-point response to each reviewer comment in blue. We are looking forward that the revision will be published in the Journal eLife.

      Reviewer #1 (Public Review):

      The authors show that concurrently presenting foreign words and their translations during sleep leads to the ability to semantically categorize the foreign words above chance. Specifically, this procedure was successful when stimuli were delivered during slow oscillation troughs as opposed to peaks, which has been the focus of many recent investigations into the learning & memory functions of sleep. Finally, further analyses showed that larger and more prototypical slow oscillation troughs led to better categorization performance, which offers hints to others on how to improve or predict the efficacy of this intervention. The strength here is the novel behavioral finding and supporting physiological analyses, whereas the biggest weakness is the interpretation of the peak vs. trough effect.

      R1.1. Major importance:

      I believe the authors could attempt to address this question: What do the authors believe is the largest implication of this studies? How far can this technique be pushed, and how can it practically augment real-world learning?

      We revised the discussion to put more emphasis on possible practical applications of this study (lines 645-656).

      In our opinion, the strength of this paper is its contribution to the basic understanding of information processing during deep sleep, rather than its insights on how to augment realworld learning. Given the currently limited data on learning during sleep, we believe it would be premature to make strong claims about potential practical applications of sleep-learning. In addition, as pointed out in the discussion section, we do not know what adverse effects sleep-learning has on other sleep-related mechanisms such as memory consolidation.

      R1.2. Lines 155-7: How do the authors argue that the words fit well within the half-waves when the sounds lasted 540 ms and didn't necessarily start right at the beginning of each half-wave? This is a major point that should be discussed, as part of the down-state sound continues into the up-state. Looking at Figure 3A, it is clear that stimulus presented in the slow oscillation trough ends at a time that is solidly into the upstate, and would not neurolinguists argue that a lot of sound processing occurs after the end of the sound? It's not a problem for their findings, which is about when is the best time to start such a stimulus, but it's a problem for the interpretation. Additionally, the authors could include some discussion on whether possibly presenting shorter sounds would help to resolve the ambiguities here.

      The word pairs’ presentations lasted on average ~540 ms. Importantly, the word pairs’ onset was timed to occur 100 ms before the maximal amplitude of the targeted peaks/troughs.

      Therefore, most of a word’s sound pattern appeared during the negative going half-wave (about 350ms of 540ms). Importantly, Brodbeck and colleagues (2022) have shown that phonemes are continuously analyzed and interpreted with delays of about 50-200 ms, peaking at 100ms delay. These results suggest that word processing started just following the negative maximum of a trough and finished during the next peak. Our interpretation (e.g. line 520+) suggests that low-level auditory processing reaches the auditory cortex before the positive going half-wave. During the positive going half-wave the higher-level semantic networks appear the extract the presented word's meaning and associate the two simultaneously presented words. We clarified the time course regarding slow-wave phases and sound presentation in the manuscript (lines 158-164). Moreover, we added the limitation that we cannot know for sure when and in which slow-wave phase words were processed (lines 645-656). Future studies might want to look at shorter lasting stimuli to narrow down the timing of the word processing steps in relation to the sleep slow waves.

      R1.3. Medium importance:

      Throughout the paper, another concern relates to the term 'closed-loop'. It appears this term has been largely misused in the literature, and I believe the more appropriate term here is 'real-time' (Bergmann, 2018, Frontiers in Psychology; Antony et al., 2022, Journal of Sleep Research). For instance, if there were some sort of algorithm that assessed whether each individual word was successfully processed by the brain during sleep and then the delivery of words was subsequently changed, that could be more accurately labelled as 'closed-loop'.

      We acknowledge that the meaning of “closed-loop” in its narrowest sense is not fulfilled here. We believe that “slow oscillation phase-targeted, brain-state-dependent stimulation” is the most appropriate term to describe the applied procedure (BSDBS, Bergmann, 2018). We changed the wording in the manuscript to brain-state-dependent stimulation algorithm. Nevertheless, we would like to point out that the algorithm we developed and used (TOPOSO) is very similar to the algorithms often termed closed-loop algorithm in memory and sleep (e.g. Esfahani et al., 2023; Garcia-Molina et al., 2018; Ngo et al., 2013, for a comparison of TOPOSO to these techniques see Wunderlin et al., 2022 and for more information about TOPOSO see Ruch et al., 2022).

      R1.4. Figure 5 and corresponding analyses: Note that the two conditions end up with different sounds with likely different auditory complexities. That is, one word vs. two words simultaneously likely differ on some low-level acoustic characteristics, which could explain the physiological differences. Either the authors should address this via auditory analyses or it should be added as a limitation.

      This is correct, the two conditions differ on auditory complexities. Accordingly, we added this issue as another limitation of the study (line 651-653). We had decided for a single word control condition to ensure that no associative learning (between pseudowords) could take place in the control condition because this was the critical learning process in the experimental condition. We would like to point out that we observed significant differences in brain responses to the presentation of word-pairs (experimental condition) vs single pseudowords (control condition) in the Trough condition, but not the Peak condition. If indeed low-level acoustic characteristics explained the EEG differences occurring between the two conditions then one would expect these differences occurring in both the trough and the peak condition because earlier studies showed that low-level acoustic processing proceeds in both phases of slow waves (Andrillon et al., 2016; Batterink et al., 2016; Daltrozzo et al., 2012).

      R1.5. Line 562-7 (and elsewhere in the paper): "episodic" learning is referenced here and many times throughout the paper. But episodic learning is not what was enhanced here. Please be mindful of this wording, as it can be confusing otherwise.

      The reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g., Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013).

      We stand by our claim that sleep-learning was of episodic nature. Here we use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000) and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). We revised the manuscript to clarify that and how our definition differs from traditional definitions. Please see reviewer comment R3.1 for a more extensive answer.

      Reviewer #2 (Public Review):

      In this project, Schmidig, Ruch and Henke examined whether word pairs that were presented during slow-wave sleep would leave a detectable memory trace 12 and 36 hours later. Such an effect was found, as participants showed a bias to categorize pseudowords according to a familiar word that they were paired with during slow-wave sleep. This behavior was not accompanied by any sign of conscious understanding of why the judgment was made, and so demonstrates that long-term memory can be formed even without conscious access to the presented content. Unconscious learning occurred when pairs were presented during troughs but not during peaks of slow-wave oscillations. Differences in brain responses to the two types of presentation schemes, and between word pairs that were later correctly- vs. incorrectly-judged, suggest a potential mechanism for how such deep-sleep learning can occur.

      The results are very interesting, and they are based on solid methods and analyses. Results largely support the authors' conclusions, but I felt that there were a few points in which conclusions were not entirely convincing:

      R2.1. As a control for the critical stimuli in this study, authors used a single pseudoword simultaneously played to both ears. This control condition (CC) differs from the experimental condition (EC) in a few dimensions, among them: amount of information provided, binaural coherence and word familiarity. These differences make it hard to conclude that the higher theta and spindle power observed for EC over CC trials indicate associative binding, as claimed in the paper. Alternative explanations can be made, for instance, that they reflect word recognition, as only EC contains familiar words.

      We agree. In the revised version of the manuscript, we emphasise this as a limitation of our study (line 653-656). Moreover, we understand that the differences between stimuli of the control and the experimental condition must not rely only on the associative binding of two words. We cautioned our interpretation of the findings.

      Interestingly, EC vs CC exhibits differences following trough- but not peak targeting (see R1.4). If indeed all the EC vs CC differences were unrelated to associative binding, we would expect the same EC vs CC differences when peaks were targeted. Hence, the selective EC vs CC differences in the trough condition suggest that the brain is more responsive to sound, information, word familiarity and word semantics during troughs, where we found successful learning, compared to peaks, where no learning occurred. Troughtargeted word pairs (EC) versus foreign words (CC) enhanced the theta power 336 at 500 ms following word onset and this theta enhancement correlated significantly with interindividual retrieval performance indicating that theta probably promoted associative learning during sleep. This correlation was insignificant for spindle power.

      R2.2. The entire set of EC pairs were tested both following 12 hours and following 36 hours. Exposure to the pairs during test #1 can be expected to have an effect over memory one day later, during test #2, and so differences between the tests could be at least partially driven by the additional activation and rehearsal of the material during test #1. Therefore, it is hard to draw conclusions regarding automatic memory reorganization between 12 and 36 hours after unconscious learning. Specifically, a claim is made regarding a third wave of plasticity, but we cannot be certain that the improvement found in the 36 hour test would have happened without test #1.

      We understand that the retrieval test at 12h may have had an impact on performance on the retrieval test at 36h. Practicing retrieval of newly formed memories is known to facilitate future retrieval of the same memories (e.g. Karpicke & Roediger, 2008). Hence, practicing the retrieval of sleep-formed memories during the retrieval test at 12h may have boosted performance at 36h.

      However, recent literature suggests that retrieval practice is only beneficial when corrective feedback is provided (Belardi et al., 2021; Metcalfe, 2017). In our study, we only presented the sleep-played pseudowords at test and participants received no feedback regarding the accuracy of their responses. Thus, a proper conscious re-encoding could not take place. Nevertheless, the retrieval at 12h may have altered performance at 36h in other ways. For example, it could have tagged the reactivated sleep-formed memories for enhanced consolidation during the next night (Rabinovich Orlandi et al., 2020; Wilhelm et al., 2011).

      We included a paragraph on the potential carry-over effects from retrieval at 12h on retrieval at 36h in the discussion section (line 489-496; line 657-659). Furthermore, we removed the arguments about the “third wave of plasticity”.

      R2.3. Authors claim that perceptual and conceptual processing during sleep led to increased neural complexity in troughs. However, neural complexity was not found to differ between EC and CC, nor between remembered and forgotten pairs. It is therefore not clear to me why the increased complexity that was found in troughs should be attributed to perceptual and conceptual word processing, as CC contains meaningless vowels. Moreover, from the evidence presented in this work at least, I am not sure there is room to infer causation - that the increase in HFD is driven by the stimuli - as there is no control analysis looking at HFD during troughs that did not contain stimulation.

      With the analysis of the HFD we would like to provide an additional perspective to the oscillation-based analysis. We checked whether the boundary condition of Peak and Trough targeting changes the overall complexity or information content in the EEG. Our goal was to assess the change in neural complexity (relative to a pre-stimulus baseline) following the successful vs unsuccessful encoding of word pairs during sleep.

      We acknowledge that a causal interpretation about HFD is not warranted, and we revised the manuscript accordingly. It was unexpected that we could not find the same results in the contrast of EC vs CC or correct vs incorrect word pairs. We suggest that our signal-to noise ratio might have been too weak.

      One could argue that the phase targeting alone (without stimulation) induces peak/trough differences in complexity. We cannot completely rule out this concern. But we tried to use the EEG that was not influenced by the ongoing slow-wave: the EEG 2000-500ms before the stimulus onset and 500-2000ms after the stimulus onset. Therefore, we excluded the 1s of the targeted slow-wave, hoping that most of the phase inherent complexity should have faded out (see Figure 2). We could not further extend the time window of analysis due to the minimal stimulus onset interval of 2s. Of course we cannot exclude that the targeted Trough impacted the following HFD. We clarified this in the manuscript (line 384-425).

      Furthermore, we did find a difference of neural complexity between the pre-stimulus baseline and the post-stimulus complexity in the Peak condition but not in the Trough condition (we now added this contrast to the manuscript, line 416-419). Hence, the change in neural complexity is a reaction to the interaction of the specific slow-wave phase with the processing of the word pairs. Even though these results cannot provide unambiguous, causal links, we think they can figure as an important start for other studies to decipher neural complexity during slow wave sleep.

      Reviewer #3 (Public Review):

      The study aims at creating novel episodic memories during slow wave sleep, that can be transferred in the awake state. To do so, participants were simultaneously presented during sleep both foreign words and their arbitrary translations in their language (one word in each ear), or as a control condition only the foreign word alone, binaurally. Stimuli were presented either at the trough or the peak of the slow oscillation using a closed-loop stimulation algorithm. To test for the creation of a flexible association during sleep, participant were then presented at wake with the foreign words alone and had (1) to decide whether they had the feeling of having heard that word before, (2) to attribute this word to one out of three possible conceptual categories (to which translations word actually belong), and (3) to rate their confidence about their decision.

      R3.1. The paper is well written, the protocol ingenious and the methods are robust. However, the results do not really add conceptually to a prior publication of this group showing the possibility to associate in slow wave sleep pairs of words denoting large or small object and non words, and then asking during ensuing wakefulness participant to categorise these non words to a "large" or "small" category. In both cases, the main finding is that this type of association can be formed during slow wave sleep if presented at the trough (versus the peak) of the slow oscillation. Crucially, whether these associations truly represent episodic memory formation during sleep, as claimed by the authors, is highly disputable as there is no control condition allowing to exclude the alternative, simpler hypothesis that mere perceptual associations between two elements (foreign word and translation) have been created and stored during sleep (which is already in itself an interesting finding). In this latter case, it would be only during the awake state when the foreign word is presented that its presentation would implicitly recall the associated translation, which in turn would "ignite" the associative/semantic association process eventually leading to the observed categorisation bias (i.e., foreign words tending to be put in the same conceptual category than their associated translation). In the absence of a dis-confirmation of this alternative and more economical hypothesis, and if we follow Ocam's razor assumption, the claim that there is episodic memory formation during sleep is speculative and unsupported, which is a serious limitation irrespective of the merits of the study. The title and interpretations should be toned down in this respect

      Our study conceptually adds to and extends the findings by Züst et al. (a) by highlighting the precise time-window or brain state during which sleep-learning is possible (e.g. slow-wave trough targeting), (b) by demonstrating the feasibility of associative learning during night sleep, and (c) by uncovering the longevity of sleep-formed memories.

      We acknowledge that the reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g, (Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013). We stand by our claim that sleep-learning was of episodic nature. We use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000), and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). The core computational features of episodic memory are 1) rapid learning, 2) association formation, and 3) a compositional and flexible representation of the associations in long-term memory.

      Therefore, we revised the manuscript to emphasize how our definition differs from traditional definitions (line 64).

      For the current study, we designed a retrieval task that calls on the core computational features of episodic memory by assessing flexible retrieval of sleep-formed compositional word-word associations. Reviewer 3 suggests an alternative interpretation for the learning observed here: mere perceptual associations between foreign words and translations words are stored during sleep, and semantic associations are only inferred at retrieval testing during ensuing wakefulness. First, these processing steps would require the rapid soundsound associative encoding, long-term storage, and the flexible sound retrieval, which would still require hippocampal processing and computations in the episodic memory system. Second, this mechanism seems highly laborious and inefficient. The sound pattern of a word at 12 hours after learning triggers the reactivation of an associated sound pattern of another word. This sound pattern then elicits the activation of the translation words’ semantics leading to the selection of the correct superordinate semantic category at test.

      Overall, we believe that our pairwise-associative learning paradigm triggered a rapid conceptual-associative encoding process mediated by the hippocampus that provided for flexible representations of foreign and translation words in episodic memory. This study adds to the existing literature by examining specific boundary conditions of sleep-learning and demonstrates the longevity (at least 36 hours) of sleep-learned associations.

      Other remarks:

      R3.2. Lines 43-45 : the assumption that the sleeping brain decides whether external events can be disregarded, requires awakening or should be stored for further consideration in the waking state is dubious, and the supporting references date from a time (the 60') during which hypnopedia was investigated in badly controlled sleep conditions (leaving open the doubt about the possibility that it occurred during micro awakenings)

      We revised the manuscript to add timelier and better controlled studies that bolster the 60ties-born claim (line 40-51). Recently, it has been shown that the sleeping brain preferentially processes relevant information. For example the information conveyed by unfamiliar voices (Ameen et al., 2022), emotional content (Holeckova et al., 2006; Moyne et al., 2022), our own compared to others’ names (Blume et al., 2018).

      R3.3. 1st paragraph, lines 48-53 , the authors should be more specific about what kind of new associations and at which level they can be stored during sleep according to recent reports, as a wide variety of associations (mostly elementary levels) are shown in the cited references. Limitations in information processing during sleep should also be acknowledged.

      In the lines to which R3 refers, we cite an article (Ruch & Henke, 2020) in which two of the three authors of the current manuscript elaborate in detail what kind of associations can be stored during sleep. We revised these lines to more clearly present the current understanding of the potential and the limitations of sleep-learning (line 40-51). Although information processing during sleep is generally reduced (Andrillon et al., 2016), a variety of different kinds of associations can be stored, ranging from tone-odour to word-word association (Arzi et al., 2012, 2014; Koroma et al., 2022; Züst et al., 2019).

      R3.4. The authors ran their main behavioural analyses on delayed retrieval at 36h rather than 12h with the argument that retrieval performance was numerically larger at 36 than 12h but the difference was non-significant (line 181-183), and that effects were essentially similar. Looking at Figure 2, is the trough effect really significant at 12h ? In any case, the fact that it is (numerically) higher at 36 than 12h might suggest that the association created at the first 12h retrieval (considering the alternative hypothesis proposed above) has been reinforced by subsequent sleep.

      The Trough effect at 12h is not significant, as stated on line 185 (“Planned contrasts against chance level revealed that retrieval performance significantly exceeded chance at 36 hours only (P36hours = 0.036, P12hours = 0.094).”). It seems that our wording was not clear. Therefore, we refined the description of the behavioural analysis in the manuscript (lines 188-193).

      In brief, we report an omnibus ANOVA with a significant main effect of targeting type (Trough vs Peak, main effect Peak versus Trough: F(1,28) = 5.237, p = 0.030, d = 0.865). Because Trough-targeting led to significantly better memory retention than Peak-targeting, we computed a second ANOVA, solely including participants with through-targeted word-pair encoding. The memory retention in the Trough condition is above chance (MTrough = 39.11%, SD = 10.76; FIntercept (1,14) = 5.660, p = 0.032) and does not significantly differ between the 12h and 36h retrieval (FEncoding-Test Delay (1,14) = 1.308, p = 0.272). However, the retrieval performance at 36h numerically exceeds the performance at 12h and the direct comparison against chance reveals that the 36h but not the 12h retrieval was significant (P36hours = 0.036, P12hours = 0.094). Hence, we found no evidence for above chance performance at the 12h retrieval and focused on the retrieval after 36h in the EEG analysis.

      We agree with the reviewer that the subsequent sleep seems to have improved consolidation and subsequent retrieval. We assume that the reviewer suggests that participants merely formed perceptual associations during sleep and encoded episodic-like associations during testing at 12h (as pointed out in R 3.1). However, we believe that it is unlikely that the awake encoding of semantic associations during the 12h retrieval led to improved performance after 36h. We changed the discussion regarding the interaction between retrieval at 12h and 36h (line 505-512, also see R 2.2)

      R3.5> In the discussion section lines 419-427, the argument is somehow circular in claiming episodic memory mechanisms based on functional neuroanatomical elements that are not tested here, and the supporting studies conducted during sleep were in a different setting (e.g. TMR)

      Indeed, the TMR and animal studies are a different setting compared to the present study. We re-wrote this part and only focused on the findings of Züst and colleagues (2019), who examined hippocampal activity during the awake retrieval of sleep-formed memories (lines 472-482). Additionally, we would like to emphasise that our main reasoning is that the task requirements called upon the episodic memory system.

      R3.6. Supplementary Material: in the EEG data the differentiation between correct and incorrect ulterior classifications when presented at the peak of the slow oscillation is only significant in association with 36h delayed retrieval but not at 12h, how do the authors explain this lack of effect at 12 hour ?

      We assume that the reviewer refers to the TROUGH condition (word-pairs targeted at a slow-wave trough) and not as written to the peak condition. We argue that the retention performance at 12h is not significantly above chance (M12hours = 37.4%, P12hours = 0.094).

      Hence, the distinction between “correctly” and “incorrectly” categorised word pairs was not informative for the EEG analysis during sleep. For whatever reason the 12h retrieval was not significantly above chance, the less successful memory recall and thus a less balanced trial count makes recall accuracy a worse delineator for separating EEG trials then the recall performance after 36 hours.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor importance:

      Abstract: The opening framing is confusing here and in the introduction. Why frame the paper in the broadest terms about awakenings and threats from the environment when this is a paper about intersections between learning & memory and sleep? I do understand that there is an interesting point to be made about the counterintuitive behavioral findings with respect to sleep generally being perceived as a time when stimuli are blocked out, but this does not seem to me to be the broadest points or the way to start the paper. The authors should consider this but of course push back if they disagree.

      We understand the reviewer’s criticism but believe that this has more to do with personal preferences than with the scientific value or validity of our work. We believe that it is our duty as researchers to present our study in a broader context because this may help readers from various fields to understand why the work is relevant. To some readers, evidence for learning during sleep may seem trivial, to others, it may seem impossible or a weird but useless conundrum. By pointing out potential evolutionary benefits of the ability to acquire new information during sleep, we help the broad readership of eLife understand the relevance of this work.

      Lines 31-32: "Neural complexity" -> "neural measures of complexity" because it isn't clear what "neural complexity" means at this point in the abstract. Though, note my other point that I believe this analysis should be removed.

      To our understanding, “neural complexity” is a frequently used term in the field and yields more than 4000 entries on google scholar. Whereas ‘neural measures of complexity’ only finds 3 hits on google scholar [September 2023]. In order to link our study with other studies on neural complexity, we would like to keep this terminology. As an example, two recent publications using “neural complexity” are Lee et al. (2020) and Frohlich et al. (2022).

      Lines 42-43: The line of work on 'sentinel' modes would be good to cite here (e.g., Blume et al., 2017, Brain & Language).

      We added the suggested citation to the manuscript (lines 52).

      Lines 84-90: While I appreciate the authors desire to dig deep and try to piece this all together, this is far too speculative in my opinion. Please see my other points on the same topic.

      In this paragraph, we point out why both peaks and troughs are worth exploring for their contributions to sensory processing and learning during sleep. Peaks and troughs are contributing mutually to sleep-learning. Our speculations should inspire further work aimed at pinning down the benefits of peaks and troughs for sleep-learning. We clarified the purpose and speculative nature of our arguments in the revised version of the manuscript.

      Line 109: "outlasting" -> "lasting over" or "lasting >"

      We changed the wording accordingly.

      Line 111: I believe 'nonsense' is not the correct term here, and 'foreign' (again) would be preferred. Some may be offended to hear their foreign word regarded as 'nonsense'. However, please let me know if I have misunderstood.

      We would like to use the linguistic term “pseudoword” (aligned with reviewer 2’s comment) and we revised the manuscript accordingly.

      Figure 1A: "Enconding" -> "Encoding"

      Thank you for pointing this out.

      Lines 201-2: Were there interactions between confidence and correctness on the semantic categorization task? Were correct responses given with more confidence than incorrect ones? This would not necessarily be a problem for the authors' account, as there can of course be implicit influences on confidence (i.e., fluency).

      As is stated in the results section, confidence ratings did not differ significantly between correct and incorrect assignments (Trough condition: F(1,14) = 2.36, p = 0.15); Peak condition: F(1,14) = 0.48, p = 0.50).

      Line 236: "Nicknazar" -> "Niknazar"

      Thank you for pointing this out.

      Line 266: "profited" -> "benefited"

      We changed the wording accordingly.

      Lines 280-4: There seems some relevance here with Malerba et al. (2018) and her other papers to categorize slow oscillations.

      Diving into the details on how to best categorise slow oscillations is beyond the scope of this manuscript. Here, we build on work from the field of microstate analyses and use two measures to describe and quantify the targeted brain states: the topography of the electric field (i.e., the correlation of the electric field with an established template or “microstate”), and the field strength (global field power, GFP). While the topography of a quasi-stable electric field reflects activity in a specific neural network, the strength (GFP) of a field most likely mirrors the degree of activation (or inactivity) in the specific network. Here, we find that consistent targeting of a specific network state yielding a strong frontal negativity benefitted learning during sleep. For a more detailed explanation of the slow-wave phase targeting see (Ruch et al., 2022).

      Lines 343-6: Was it intentional to have 0.5 s (0.2-0.7 s) surrounding the analysis around 500 ms but only 0.4 s (0.8-1.2 s) surrounding the analysis around 1 s? Could the authors use the same size interval or justify having them be different?

      We apologise for the misleading phrasing and we clarified this in the revised manuscript. We applied the same procedure for the comparison of later correctly vs incorrectly classified pseudowords as we did for the comparison between EC and CC. Hence, we analysed the entire window from 0s to 2.5s with a cluster-based permutation approach. Contrary to the EC vs CC contrast, no cluster remained significant for the comparison of the subsequent memory effect. By mistake we reported the wrong time window. In the revised manuscript, the paragraph is corrected (lines 364-369).

      Line 356-entire HFD section: it is unclear what's gained by this analysis, as it could simply be another reflection of the state of the brain at the time of word presentation. In my opinion, the authors should remove this analysis and section, as it does not add clarity to other aspects of the paper.

      (If the authors keep the section) Line 361-2 - "Moreover, high HFD values have been associated with cognitive processing (Lau et al., 2021; Parbat & Chakraborty, 2021)." This statement is vague. Could the authors elaborate?

      Please see our answer to Reviewer 2 (2.3) for a more detailed explanation. In brief, we would like to keep the analysis with the broad time window of -2 to -0.5 and from 0.5 to 2 s.

      Lines 403-4: How was it determined that these neural networks mediated both conscious/unconscious processes? Perhaps the authors meant to make a different point, but the way it reads to me is that there is evidence that some neural networks are conscious and others are not and both forms engage in similar functions.

      We revised the manuscript to be more precise and clear: “The conscious and unconscious rapid encoding and flexible retrieval of novel relational memories was found to recruit the same or similar networks including the hippocampus(Henke et al., 2003; Schneider et al., 2021). This suggests that conscious and unconscious relational memories are processed by the same memory system.” (p. 22, top).

      Lines 433-41: Performance didn't actually significantly increase from 12 to 36 hours, so this is all too speculative in my opinion.

      We removed the speculative claim that performance may have increased from the retrieval at 12 hours to the retrieval at 36 hours.

      Line 534: "assisted by enhanced" -> "coincident with". It's unclear whether theta reflects successful processing as having occurred or whether it directly affects or assists with it.

      We have adjusted the wording to be more cautious, as suggested (line 588).

      Line 572-4: Rothschild et al. (2016) is relevant here.

      Unfortunately, we do not see the relevance of this article within the context of our work.

      Line 577 paragraph: The authors may consider adding a note on the importance of ethical considerations surrounding this form of 'inception'.

      We extended this part by adding ethical considerations to the discussion section (Stickgold et al., 2021, line 657).

      Line 1366: It would be better if the authors could eventually make their data publicly available. This is obviously not required, but I encourage the authors to consider it if they have not considered it already.

      In my opinion, the discussion is too long. I really appreciate the authors trying to figure out the set of precise times in which each level of neural processing might occur and how this intersects with their slow oscillation phase results. However, I found a lot of this too speculative, especially given that the sounds may bleed into parts of other phases of the slow oscillation. I do not believe this is a problem unique to these authors, as many investigators attempting to target certain phases in the target memory reactivation literature have faced the same problem, but I do believe the authors get ahead of the data here. In particular, there seems to be one paragraph in the discussion that is multiple pages long (p. 22-24). This paragraph I believe has too much detail and should be broken up regardless, as it is difficult for the reader to follow.

      Considering the recent literature, we believe this interpretation best explains the data. As argued earlier, we believe that a speculative interpretation of the reported phenomena can provide substantial added value because it inspires future experimental work. We have improved the manuscript by clearly distinguishing between data and interpretation. We do declare the speculative nature of some offered interpretations. We hope that these speculations, which are testable hypotheses (!), will eventually be confirmed or refuted experimentally.

      Reviewer #2 (Recommendations For The Authors):

      I very much enjoyed the paper and think it describes important findings. I have a few suggestions for improvement, and minor comments that caught my eye during reading:

      (1) I was missing an analysis of CC ERP, and its comparison to EC ERP.

      We added this analysis to the manuscript (line 299-301). The comparison of CC ERP with EC ERP did not yield any significant cluster for either the peak (cluster-level Monte Carlo p=0.54) or the trough (cluster-level Monte Carlo p>0.37). We assume that the noise level was too high for the identification of differences between CC and EC ERP.

      (2) Regarding my public review comment #2, some light can be shed on between-test effects, I believe, using an item-based analysis - looking at correlations between items' classifications in test #1 and test #2. The assumption seems to be that items that were correct in test #1 remained correct in test #2 while other new correct classifications were added, owing to the additional consolidation happening between the two tests. But that is an empirical question that can be easily tested. If no consistency in item classification is found, on the other hand, or if only consistency in correct classification is found, that would be interesting in itself. This item-based analysis can help tease away real memory from random correct classification. For instance, the subset of items that are consistently classified correctly could be regarded as non-fluke at higher confidence and used as the focus of subsequent-memory analysis instead of the ones that were correct only in test #2.

      Thanks, we re-analysed the data accordingly. Participants were consistent at choosing a specific object category for an item at 12 hours and 36 hours (consistency rate = 47% same category, chance level is 1/3). Moreover, the consistency rate did not differ between the Trough and the Peak condition (MTrough = 47.2%, MPeak = 47.0%, P = 0.98). The better retrieval performance in the Trough compared to the Peak condition after 36 hours is due to: A) if participants were correct at 12h, they chose again the correct answer at 36h (Trough: 20% & Peak: 14%). B) Following an incorrect answer at 12h, participants switched to another object category at 36h (Trough: 72%, Peak: 67%). C) If participants switched the object category following an incorrect answer at 12h, they switched more often to the correct category at 36h in the trough versus the peak condition (Trough: in 56% & Peak: 53%). Hence, the data support the reviewer’s assumption: items that were correct after 12 hours remained correct after 36 hours, while other new correct classifications were generated at 36h owing to the additional consolidation happening between the two tests. We added this finding to the manuscript (line 191-200, Figure S6):

      Author response image 1.

      As suggested, we re-analysed the ERP with respect to the subsequent memory effect. This time we computed four conditions according to the reviewer’s argument about consistently correctly classified pseudowords, presented in the figure below: ERP of trials that were correctly classified at 36h (blue), ERP of trials that were incorrectly classified at 36h (light blue), ERP of trials that were correctly classified twice (brown) and ERP of trials that were not correctly classified twice (orange, all trials that are not in brown). Please note that the two blue lines are reported in the manuscript and include all trials. The brown and the orange line take the consistency into account and together include as well all trials.

      Author response image 2.

      By excluding even more trials from the group of correct retrieval responses, the noise level gets high. Therefore, the difference between the twice-correct and the not-twice-correct trials is not significant (cluster-level Monte Carlo p > 0.27). Because the ERP of twice-correct trials seems very similar to the ERP of the trials correctly classified at 36h at frontal electrodes, we assume that our ERP effect is not driven by a few extreme subjects. Similarly, not-twicecorrect trials (orange) have a stronger frontal trough than the trials incorrectly classified at 36h (light blue).

      (3) In a similar vein, a subject-based analysis would be highly interesting. First and foremost, readers would benefit from seeing the lines that connect individual dots across the two tests in figures 2B and 2C. It is reasonable to expect that only a subset of participants were successful learners in this experiment. Finding them and analyzing their results separately could be revealing.

      We added a Figure S1 to the supplementary material, providing the pairing between performance of the 12h and the 36h retrieval.

      It is an interesting idea to look at successful learners alone. We computed the ERP of the subsequent memory effect for those participants, who had an above change retrieval accuracy at 36h. The result shows a similar effect as reported for all participants (frontal cluster ~0-0.3s). The p-value is only 0.08 because only 9 of 15 participants exhibited an above chance retrieval performance at 36 hours.

      Author response image 3.

      ERP effect of correct (blue) vs incorrect (light blue) pseudoword category assignment of participants with a retrieval performance above chance at 36h (SD as shades):

      We prefer to not include this data in the manuscript, but are happy to provide it here.

      (4) I wondered why the authors informed subjects of the task in advance (that they will be presented associations when they slept)? I imagine this may boost learning as compared to completely naïve subjects. Whether this is the reason or not, I think an explanation of why this was done is warranted, and a statement whether authors believe the manipulation would work otherwise. Also, the reader is left wondering why subjects were informed only about test #1 and not about test #2 (and when were they told about test #2).

      Subjects were informed of all the tests upfront. We apologize for the inconsistency in the manuscript and revised the method part. The explanation of why participants were informed is twofold: a) Participants had to sleep with in-ear headphones. We wanted to explain to participants why these are necessary and why they should not remove them. b) We hoped that participants would be expecting unconsciously sounds played during sleep, would process these sounds efficiently and would remain deeply asleep (no arousals).

      (5) FoHH is a binary yes/no question, and so may not have been sensitive enough to demonstrate small differences in familiarity. For comparison, the Perceptual Awareness Scale (Ramsøy & Overgaard, 2004) that is typically used in studies of unconscious processing is of a 4-point scale, and this allows to capture more nuanced effects such as partial consciousness and larger response biases. Regardless, it would be informative to have the FoHH numbers obtained in this study, and not just their comparison between conditions. Also, was familiarity of EC and CC pseudowords compared? One may wonder whether hearing the pseudowords clearly vs. in one ear alongside a familiar word would make the word slightly more familiar.

      We apologize for having simplified this part too much in the manuscript. Indeed, the FoHH is comparable to the PAS. We used a 4-point scale, where participants rated their feeling of whether they have heard the pseudoword during previous sleep. In the revised manuscript, we report the complete results (line 203-223). The FoHH did not differ between any of the suggested contrasts. Thus, for both the peak and the trough condition, the FoHH did not differ between sleep-played vs new; correct EC trials vs new; correct vs incorrect EC trials; EC vs CC trials. To illustrate the results, a figure of the FoHH has been added to the supplement (Figure S4).

      (6) Similarly, it would be good to report the numbers of the confidence ratings in the paper as well.

      In the revised manuscript, we extended the description of the confidence rating results. We added the descriptive statistics (line 224-236) and included a corresponding figure in the supplement (Figure S5).

      Minor/aesthetic comments:

      We implemented all the following suggestions.

      (1) I suggest using "pseudoword" or "nonsense word" instead of "foreign word", because "foreign word" typically means a real word from a different language. It is quite confusing when starting to read the paper.

      After reconsidering, we think that pseudoword is the appropriate linguistic term and have revised the manuscript accordingly.

      (2) Lines 1000-1001: "The required sample size of N = 30 was determined based on a previous sleep-learning study". I was missing a description of what study you are referring to.

      (3) I am not sure I understood the claim nor the rationale made in lines 414-417. Is the claim that pairs did not form one integrated engram? How do we know that? And why would having one engram not enable extracting the meaning from a visual-auditory presentation of the cue? The sentence needs some rewording and/or unpacking.

      (4) Were categories counterbalanced (i.e., did each subjects' EC contain 9 animal words, 9 tool words and 9 place words)?

      (5) Asterisks indicating significant effects are missing from Figure 4 and S2.

      (6) Fig1 legend: "Participants were played with pairs" is ungrammatical.

      (7) Line 1093: no need for a comma.

      (8) Line 1336: missing opening parenthesis

      (9) Line 430: "observe" instead of "observed".

      (10) Line 466: two dots instead of one..

      Reviewer #3 (Recommendations For The Authors):

      Methods: 2 separate ANOVAs are performed (lines 160-185), but would not it make more sense to combine both in one ? If kept separated then a correction for multiple comparisons might be needed (p/2 = 0.025)

      We computed an omnibus ANOVA. In a next step, we examined the effect in the significant targeting condition by computing another ANOVA. For further explanations, see reviewer comment 3.4.

      References

      Ameen, M. S., Heib, D. P. J., Blume, C., & Schabus, M. (2022). The Brain Selectively Tunes to Unfamiliar Voices during Sleep. Journal of Neuroscience, 42(9), 1791–1803. https://doi.org/10.1523/JNEUROSCI.2524-20.2021

      Andrillon, T., Poulsen, A. T., Hansen, L. K., Léger, D., & Kouider, S. (2016). Neural Markers of Responsiveness to the Environment in Human Sleep. The Journal of Neuroscience, 36(24), Article 24. https://doi.org/10.1523/JNEUROSCI.0902-16.2016

      Arzi, A., Holtzman, Y., Samnon, P., Eshel, N., Harel, E., & Sobel, N. (2014). Olfactory Aversive Conditioning during Sleep Reduces Cigarette-Smoking Behavior. Journal of Neuroscience, 34(46), Article 46. https://doi.org/10.1523/JNEUROSCI.2291-14.2014

      Arzi, A., Shedlesky, L., Ben-Shaul, M., Nasser, K., Oksenberg, A., Hairston, I. S., & Sobel, N. (2012). Humans can learn new information during sleep. Nature Neuroscience, 15(10), Article 10. https://doi.org/10.1038/nn.3193

      Batterink, L. J., Creery, J. D., & Paller, K. A. (2016). Phase of Spontaneous Slow Oscillations during Sleep Influences Memory-Related Processing of Auditory Cues. Journal of Neuroscience, 36(4), 1401–1409. https://doi.org/10.1523/JNEUROSCI.3175-15.2016

      Belardi, A., Pedrett, S., Rothen, N., & Reber, T. P. (2021). Spacing, Feedback, and Testing Boost Vocabulary Learning in a Web Application. Frontiers in Psychology, 12. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.757262

      Bergmann, T. O. (2018). Brain State-Dependent Brain Stimulation. Frontiers in Psychology, 9, 2108. https://doi.org/10.3389/fpsyg.2018.02108

      Blume, C., del Giudice, R., Wislowska, M., Heib, D. P. J., & Schabus, M. (2018). Standing sentinel during human sleep: Continued evaluation of environmental stimuli in the absence of consciousness. NeuroImage, 178, 638–648. https://doi.org/10.1016/j.neuroimage.2018.05.056

      Brodbeck, C., & Simon, J. Z. (2022). Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Frontiers in Neuroscience, 16. https://www.frontiersin.org/articles/10.3389/fnins.2022.828546

      Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia, and the Hippocampal System. A Bradford Book.

      Daltrozzo, J., Claude, L., Tillmann, B., Bastuji, H., & Perrin, F. (2012). Working memory is partially preserved during sleep. PloS One, 7(12), Article 12.

      Dew, I. T. Z., & Cabeza, R. (2011). The porous boundaries between explicit and implicit memory: Behavioral and neural evidence. Annals of the New York Academy of Sciences, 1224(1), 174–190. https://doi.org/10.1111/j.1749-6632.2010.05946.x

      Esfahani, M. J., Farboud, S., Ngo, H.-V. V., Schneider, J., Weber, F. D., Talamini, L. M., & Dresler, M. (2023). Closed-loop auditory stimulation of sleep slow oscillations: Basic principles and best practices. Neuroscience & Biobehavioral Reviews, 153, 105379. https://doi.org/10.1016/j.neubiorev.2023.105379

      Frohlich, J., Chiang, J. N., Mediano, P. A. M., Nespeca, M., Saravanapandian, V., Toker, D., Dell’Italia, J., Hipp, J. F., Jeste, S. S., Chu, C. J., Bird, L. M., & Monti, M. M. (2022). Neural complexity is a common denominator of human consciousness across diverse regimes of cortical dynamics. Communications Biology, 5(1), Article 1. https://doi.org/10.1038/s42003-022-04331-7

      Gabrieli, J. D. E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 87–115.

      Garcia-Molina, G., Tsoneva, T., Jasko, J., Steele, B., Aquino, A., Baher, K., Pastoor, S., Pfundtner, S., Ostrowski, L., Miller, B., Papas, N., Riedner, B., Tononi, G., & White, D. P. (2018). Closed-loop system to enhance slow-wave activity. Journal of Neural Engineering, 15(6), 066018. https://doi.org/10.1088/1741-2552/aae18f

      Hannula, D. E., Minor, G. N., & Slabbekoorn, D. (2023). Conscious awareness and memory systems in the brain. WIREs Cognitive Science, 14(5), e1648. https://doi.org/10.1002/wcs.1648

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), Article 7. https://doi.org/10.1038/nrn2850

      Henke, K., Mondadori, C. R. A., Treyer, V., Nitsch, R. M., Buck, A., & Hock, C. (2003). Nonconscious formation and reactivation of semantic associations by way of the medial temporal lobe. Neuropsychologia, 41(8), Article 8. https://doi.org/10.1016/S0028-3932(03)00035-6

      Holeckova, I., Fischer, C., Giard, M.-H., Delpuech, C., & Morlet, D. (2006). Brain responses to a subject’s own name uttered by a familiar voice. Brain Research, 1082(1), 142–152. https://doi.org/10.1016/j.brainres.2006.01.089

      Karpicke, J. D., & Roediger, H. L. (2008). The Critical Importance of Retrieval for Learning. Science, 319(5865), 966–968. https://doi.org/10.1126/science.1152408

      Koroma, M., Elbaz, M., Léger, D., & Kouider, S. (2022). Learning New Vocabulary Implicitly During Sleep Transfers With Cross-Modal Generalization Into Wakefulness. Frontiers in Neuroscience, 16, 801666. https://doi.org/10.3389/fnins.2022.801666

      Lee, Y., Lee, J., Hwang, S. J., Yang, E., & Choi, S. (2020). Neural Complexity Measures. Advances in Neural Information Processing Systems, 33, 9713–9724. https://proceedings.neurips.cc/paper/2020/hash/6e17a5fd135fcaf4b49f2860c2474c7 c-Abstract.html

      Metcalfe, J. (2017). Learning from Errors. Annual Review of Psychology, 68(1), 465–489. https://doi.org/10.1146/annurev-psych-010416-044022

      Moscovitch, M. (2008). The hippocampus as a “stupid,” domain-specific module: Implications for theories of recent and remote memory, and of imagination. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 62, 62–79. https://doi.org/10.1037/1196-1961.62.1.62

      Moyne, M., Legendre, G., Arnal, L., Kumar, S., Sterpenich, V., Seeck, M., Grandjean, D., Schwartz, S., Vuilleumier, P., & Domínguez-Borràs, J. (2022). Brain reactivity to emotion persists in NREM sleep and is associated with individual dream recall. Cerebral Cortex Communications, 3(1), tgac003. https://doi.org/10.1093/texcom/tgac003

      Ngo, H.-V. V., Martinetz, T., Born, J., & Mölle, M. (2013). Auditory Closed-Loop Stimulation of the Sleep Slow Oscillation Enhances Memory. Neuron, 78(3), Article 3. https://doi.org/10.1016/j.neuron.2013.03.006

      O’Reilly, R. C., Bhattacharyya, R., Howard, M. D., & Ketz, N. (2014). Complementary Learning Systems. Cognitive Science, 38(6), 1229–1248. https://doi.org/10.1111/j.1551-6709.2011.01214.x

      O’Reilly, R. C., & Rudy, J. W. (2000). Computational principles of learning in the neocortex and hippocampus. Hippocampus, 10(4), 389–397. https://doi.org/10.1002/1098-1063(2000)10:4<389::AID-HIPO5>3.0.CO;2-P

      Rabinovich Orlandi, I., Fullio, C. L., Schroeder, M. N., Giurfa, M., Ballarini, F., & Moncada, D. (2020). Behavioral tagging underlies memory reconsolidation. Proceedings of the National Academy of Sciences, 117(30), 18029–18036. https://doi.org/10.1073/pnas.2009517117

      Reder, L. M., Park, H., & Kieffaber, P. D. (2009). Memory systems do not divide on consciousness: Reinterpreting memory in terms of activation and binding. Psychological Bulletin, 135(1), Article 1. https://doi.org/10.1037/a0013974

      Ruch, S., & Henke, K. (2020). Learning During Sleep: A Dream Comes True? Trends in Cognitive Sciences, 24(3), 170–172. https://doi.org/10.1016/j.tics.2019.12.007

      Ruch, S., Schmidig, F. J., Knüsel, L., & Henke, K. (2022). Closed-loop modulation of local slow oscillations in human NREM sleep. NeuroImage, 264, 119682. https://doi.org/10.1016/j.neuroimage.2022.119682

      Schacter, D. L. (1998). Memory and Awareness. Science, 280(5360), 59–60. https://doi.org/10.1126/science.280.5360.59

      Schneider, E., Züst, M. A., Wuethrich, S., Schmidig, F., Klöppel, S., Wiest, R., Ruch, S., & Henke, K. (2021). Larger capacity for unconscious versus conscious episodic memory. Current Biology, 31(16), 3551-3563.e9. https://doi.org/10.1016/j.cub.2021.06.012

      Shohamy, D., & Turk-Browne, N. B. (2013). Mechanisms for widespread hippocampal involvement in cognition. Journal of Experimental Psychology: General, 142(4), 1159–1170. https://doi.org/10.1037/a0034461

      Squire, L. R., & Dede, A. J. O. (2015). Conscious and Unconscious Memory Systems. Cold Spring Harbor Perspectives in Biology, 7(3), a021667. https://doi.org/10.1101/cshperspect.a021667

      Stickgold, R., Zadra, A., & Haar, A. J. H. (2021). Advertising in Dreams is Coming: Now What? Dream Engineering. https://dxe.pubpub.org/pub/dreamadvertising/release/1

      Tulving, E. (2002). Episodic Memory: From Mind to Brain. Annual Review of Psychology, 53(1), 1–25. https://doi.org/10.1146/annurev.psych.53.100901.135114

      Wilhelm, I., Diekelmann, S., Molzow, I., Ayoub, A., Mölle, M., & Born, J. (2011). Sleep Selectively Enhances Memory Expected to Be of Future Relevance. Journal of Neuroscience, 31(5), 1563–1569. https://doi.org/10.1523/JNEUROSCI.3575-10.2011

      Wunderlin, M., Koenig, T., Zeller, C., Nissen, C., & Züst, M. A. (2022). Automatized online prediction of slow-wave peaks during non-rapid eye movement sleep in young and old individuals: Why we should not always rely on amplitude thresholds. Journal of Sleep Research, 31(6), e13584. https://doi.org/10.1111/jsr.13584

      Züst, M. A., Ruch, S., Wiest, R., & Henke, K. (2019). Implicit Vocabulary Learning during Sleep Is Bound to Slow-Wave Peaks. Current Biology, 29(4), 541-553.e7. https://doi.org/10.1016/j.cub.2018.12.038

    1. Author Response

      We thank both reviewers for the positive evaluation of our work and suggestions on how to improve it.

      We agree with Reviewer #1 that reporting uncertainties will both clarify and strengthen our arguments. Where applicable, uncertainties will be added in a revised version.

      To Reviewer #2’s suggestion of including free energy calculations to estimate the free energies of hydrogen bond and hydrophobic interactions, the current free energy methods are capable of given accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet.

      Again, we thank the reviewers for their assessment and suggestions. We will update the manuscript as we have outlined above.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public review

      Reviewer 1

      Zhang et al. tackle the important topic of primate-specific structural features of the brain and the link with functional specialization. The authors explore and compare gyral peaks of the human and macaque cortex through non-invasive neuroimagery, using convincing techniques that have been previously validated elsewhere. They show that nearly 60% of the macaque peaks are shared with humans, and use a multi-modal parcellation scheme to describe the spatial distribution of shared and unique gyral peaks in both species.

      We thank the reviewer for his/her summary and affirmation of our work.

      The claim is made that shared peaks are mainly located in lower-order cortical areas whereas unique peaks are located in higher-order regions, however, no systematic comparison is made. The authors then show that shared peaks are more consistently found across individuals than unique peaks, and show a positive but small and non-significant correlation between cross-individual counts of the shared peaks of the human and the macaque i.e. the authors show a non-significant trend for shared peaks that are more consistently found across humans to be those that are also more found across macaques.

      Answer: We appreciate the reviewer for raising questions about our work. In order to provide a more systematic comparison for the conclusion that ‘shared peaks are mainly located in lowerorder cortical areas whereas unique peaks are located in higher-order regions’, we have conducted two additional experiments. Following the reviewers’ suggestions, we conducted a statistical analysis of the ratio of shared and unique peaks within different brain networks (as depicted in Figure 2 (b)), and also presented the specific distribution quantities of the two types of peaks in both low- and high-order brain networks (as detailed in the corresponding Table 1). Through these three experiments, we have obtained a more systematic and comprehensive conclusion that ‘shared peaks are more distributed in lower-order networks, while unique peaks are more in higher-order networks’.

      In order to identify if unique and shared peaks could be identified based on the structural features of the cortical regions containing them, the authors compared them with t-tests. A correction for multiple comparisons should be applied and t-values reported. Graph-theoretical measures were applied to functional connectivity datasets (resting-state fMRI) and compared between unique and shared peak regions for each species separately. Again the absence of multiple comparison correction and t-values make the results hard to interpret. The same comment applies to the analysis reporting that shared peaks are surrounded by a larger number of brain regions than unique peaks. Finally, the potentially extremely interesting results about differential human gene expression of shared and unique peaks regions are not systematically reported e.g. the 28 genes identified are not listed and the selection procedure of 7 genes is not fully reported.

      Answer: We appreciate the reviewer for their suggestions about the statistical analysis in our manuscript. Firstly, we applied False Discovery Rate (FDR) correction to all experiments involving multiple comparisons throughout the entire manuscript, and the corrected t-values are reported (Table 2-5 and A5-A6). Additionally, in response to the reviewers’ guidance regarding the gene analysis section, we provided a list of 28 genes (Table A7) selected by lasso, along with the t-values obtained from Welch’s t-test for the expression of the two type of peaks. The functions corresponding to the seven genes with final t-values below 0.05 are reported in Table 6.

      The paper is well written and the methods used for data processing are very compelling i.e. the peak cluster extraction pipeline and cross-species registration. However, the analysis and especially the reporting of statistics, as they stand now, constitutes the main weakness of the paper. Some aspects of the statistical analysis need to be clarified.

      Reviewer 2

      The authors compared the cortical folding of human brains with folding in macaque monkey brains to reveal shared and unique locations of gyral peaks. The shared gyral peaks were located in cortical regions that are functionally similar and less changed in humans from those in macaques, while the locations of unique peaks in humans are in regions that have changed or expanded functions. These findings are important in that they suggest where human brains have changed more than macaque brains in their subsequent evolution from a common ancestor. The massive analysis of comparative results provides evidence of where humans and macaques are similar or different in cortical markers, as well as noting some of the variations within each of the two primates.

      Answer: Gratitude to the reviewer for his/her summary and appreciation of our cross-species work.

      Strengths:

      The study includes massive detail.

      Weaknesses:

      The manuscript is too long and there is not enough focus on the main points.

      Answer: We appreciate the reviewer for pointing out the shortcomings in our manuscript. Firstly, considering the manuscript is too long, we have chosen to retain only the core experiments and relevant analyses in the main text. Relatively minor conclusions have been moved to the supplementary information, such as original Table 1 is now moved to the Supplementary Information as Table A1 (locations of all shared clusters). Additionally, some non-essential expressions in the original manuscript have been removed.

      Our experiments primarily revealed the existence of partially shared cortical landmarks, known as gyral peaks, in both humans and macaques. We found that these shared and unique peaks are mainly distributed across low- and high-order brain networks. To emphasize this main point, we added two experiments on top of the existing ones to provide a more systematic explanation of this conclusion. We conducted a statistical analysis of the ratio of shared and unique peaks within different brain networks (as depicted in Figure 2 (b)), and also presented the specific distribution quantities of the two types of peaks in both low- and high-order brain networks (as detailed in the corresponding Table 1). By combining the results of these two experiments with the original manuscript’s statistical findings on the proportions of the two type of peaks in different brain networks, the conclusion that ‘shared and unique peaks are predominantly located in low-order and high-order brain networks’ becomes more prominent.

      A brief listing of previous views on why fissures form and what factors are important would be helpful.

      Answer: In response to this suggestion from the reviewer, we have incorporated some previous views on why fissures form and what factors are important into the ‘Introduction’ section.

      ‘Cortical folds are important features of primate brains. The primary driver of cortical folding is the differential growth between cortical and subcortical layers. During the gyrification process in the cortex, areas with high-density stiff axonal fiber bundles towards gyri. The brain’s folding pattern formed through a series of complex processes. The folding patterns in the brain, formed through a series of complex processes, are found to play a crucial role in various cognitive and behavioral processes, including perception, action, and cognition (Fornito et al. 2004; Cachia et al. 2018; Yang et al. 2019; Whittle et al. 2009).’

      Reviewer 1 (Recommendations For The Authors):

      (1) Figure 3b shows a non-significant trend for shared peaks that are more consistently found across humans to be those that are also more found across macaques. In the discussion, lines 218-219, the fact that the correlation is not significant should be reported more clearly.

      Answers: We thank the reviewer for this question. We revised the Line 218-219 (now Line 257-259) as follows: ‘2. Consistency: The inter-individual consistency of shared peaks within each species was greater than that of unique peaks. The consistency of shared peaks in the human and macaque brains exhibits a positive correlation (non-significant though).’

      (2) It is not fully clear how much shared peaks are mostly distributed in the higher-order cortex, especially in the macaque. It is reported in the results lines 132-133 that ‘In the macaque brain, shared peak cluster centers most distributed in the V2, DMN, and CON (Figure.2 (d)), while unique peak cluster centers most distributed in the DMN, Language (Lan), and Dorsal-attention (DAN)’ but not further discussed. Please develop this point in the discussion. Further, the results presented in Figures 2 and A1 are actually quite different and this shall be better described in the results. Given that shared and unique peaks can be found in the same region, this analysis would gain importance by applying a comparison test for the selection of regions where the most shared or unique peaks are found. The sentence lines 306-308 should be accordingly revised.

      It is hard to understand what the 0-3% corresponds to in Figures 2 and A1?

      Please also correct in both legends and in the text the labeling of panels and add in the legends a brief description of panel (c). In the legend of Figure 2, ‘shared peaks’ in the second sentence shall be replaced by ‘unique peaks’.

      Answers: We thank the reviewer for these questions and suggestions. Our responses to them are itemized as follows:

      A1: In general, to clarify the distribution of shared and unique peaks in the high-order and loworder networks, we divided 12 brain networks in Cole-Anticevic atlas into the low-order networks (visual 1 (V1), visual 2 (V2), auditory (Aud), somatomotor (SMN), posterior multimodal (PMN), ventral multimodal (VMN), and orbito-affective networks (OAN)) and higher-order networks (include cingulo-opercular (CON), dorsal attention (DAN), language (Lan), frontoparietal (FPN), default mode network (DMN)) based on previous research (Golesorkhi et al. 2022; Ito, Hearne, and Cole 2020). On this lower/higher -order division, we reported the number of shared and unique peaks in both species in Author response table 1. It is found that, whether in humans or macaques, shared peaks are more distributed in lower-order networks, while unique peaks are more in higher-order networks. This observation is particularly pronounced in humans.

      Author response table 1.

      The number of shared and unique peaks in lower- and higher-order brain networks of the two species. Lower-order networks include visual 1 (V1), visual 2 (V2), auditory (Aud), somatomotor (SMN), posterior multimodal (PMN), ventral multimodal (VMN), and orbito-affective networks (OAN), higher-order networks include cingulo-opercular (CON), dorsal attention (DAN), language (Lan), frontoparietal (FPN), default-mode network (DMN).

      In the main text, Figure 2 (referring to Author response figure 1 later in the text.) illustrates the proportions of shared and unique peaks across 12 brain networks in both species. In each pie chart, we have specifically highlighted the top three ranked brain regions. Although the pie chart also generally supports the above results, two brain networks deserve further discussion. They are DMN and CON, two higher-order networks that have higher ranks in terms of shared peak count (the second-ranked and the third-ranked on macaque shared peaks; the fourth-ranked and the fifth-ranked on human shared peaks).

      The cingulo-opercular network (CON) is a brain network associated with action, goal, arousal, and pain. However, a study found three newly discovered areas of the primary motor cortex that exhibit strong functional connectivity with the CON region, forming a novel network known as the somato-cognitive action network (SCAN) (Gordon et al. 2023). The SCAN integrates body control (motor and autonomic) and action planning, consistent with the findings that aspects of higher-level executive control might derive from movement coordination (Llinás 2002; Gordon et al. 2023). CON may be shared in the form of the SCAN network across these two species. This could explain in part the results in Author response figure 1 that shared peaks are more on CONs.

      Author response image 1.

      Pie chart shows the count of shared and unique peaks across different brain networks for both human and macaque. Right panel shows the Cole-Anticevic (CA) networks (Ji et al. 2019) on human surface as a reference.

      Default-mode network (DMN) is a ensemble of brain regions that are active in passive tasks, including the anterior and posterior cingulate cortex, medial and lateral parietal cortex, and medial prefrontal cortex (Buckner, Andrews-Hanna, and Schacter 2008). Although DMN is considered a higher-order brain network, numerous studies have provided evidence of its homologous presence in both humans and macaques. Many existing studies have confirmed the similarity between the DMN regions in humans and macaques from various perspectives, including cytoarchitectonic (Parvizi et al. 2006; Buckner, Andrews-Hanna, and Schacter 2008; Caminiti et al. 2010) and anatomical tracing (Vincent et al. 2007). These studies all support the notion that some elements of the DMN may be conserved across primate species (Mantini et al. 2011). In general, the partial sharing of DMN between humans and macaques may be attributed to the higher occurrence of shared peaks within the DMN.

      These results have been added to Table 2 along with corresponding text and discussion section.

      A2: The difference between the results of Figure 2 and Figure A1 (now Figure A2) is whether the peak count is normalized by cortical area, which hugely varies across networks. For example, among the 12 brain networks, the three networks with the largest surface areas are the DMN, SMN and CON, and the three networks with the smallest area are OAN, PMN and VMN. The area difference between networks can be as large as 18-fold. Therefore, it is not difficult to find that, although the DMN ranks high in both shared and unique peak counts during statistical analysis (Figure 2 (a)), it is relatively small in Figure A2 after area normalization. In contrast, VMN ranks lower in peak count statistics but exhibits a substantial proportion after area normalization (For example, 38% of macaque shared peaks are distributed in the VMN region, but there are actually only four peaks). However, the two pie charts deliver the same message that there are more shared peaks in lower-order networks, while unique peaks are more in higher-order networks (except for macaques, where shared peaks are also distributed significantly in DMN and CON).

      Following the suggestion from the reviewer, we adopted a new approach to present the ratio between shared peak count and unique peak count for each network (see Author response figure 2), such that the networks where the most shared or unique peaks are found can be easily highlighted. To mitigate potential imbalances in proportions caused by differences in the absolute numbers of each category (shared or unique) of peak, the proportions of peaks within their respective categories were utilized in the calculations. In Author response figure 2, the pink and green color bins represent ratios of shared and unique peaks, respectively. The dark blue dashed line represents the 50% reference line. In general, from left to right in the figure, the ratio of shared peaks decreases gradually while the ratio of unique peaks increases, suggesting that shared peaks are more (>0.5, above the dashed line) on lower-order networks (orange font), while unique peaks are generally more on higher-order networks (blue font). In specific, in human brains, the networks with a higher abundance of shared peaks are Aud, VMN, V1, SMN, and V2; whereas in macaques, they are CON, VMN, V1, V2, FPN, and SMN. Again, in the human brains, the disparity between shared and unique peaks tends to be more significant (further away from the reference line), for both lower-order and higher-order networks, respectively. In contrast, in the macaque brains, the disparity between shared and unique peaks is less significant (closer to the reference line). The ratio of shared and unique peaks is around 0.5 for 6 out of all 10 networks (including both lower and higher-order ones).

      Author response image 2.

      The ratio of shared and unique peaks in each brain network in the Cole-Anticevic (CA) atlas. The pink and green color bins represent ratios of shared and unique peaks, respectively. The dark blue dashed line represents the 50% reference line. For each brain region, the sum of the ratios of shared and unique peaks is equal to 1.

      Based on these analyses, the sentence lines 306-308 (now Line 368-370) has been revised as follows: ‘In the human brain, the more shared peaks (about 65%) are located in lower-order brain regions, while unique peaks are mainly (about 74%) located in higher-order regions. However, this trend is relatively less pronounced in the macaque brain.’

      These results have been added to Figure 2 (b) along with corresponding text and discussion section.

      A3: In response to the third suggestion from the reviewer, we have clearly labeled the brain region names corresponding to 0% to 3% in Figure 2 (now Figure 2 (a)) and Figure A1 (now Figure A2).

      Author response image 3.

      Pie chart shows the count of shared and unique peaks across different brain networks for both human and macaque. Right panel shows the Cole-Anticevic (CA) networks (Ji et al. 2019) on human surface as a reference.

      A4: Finally, we would like to express our gratitude to the reviewer for pointing out our mistakes.

      We have made improvements to Figure 2 and revised the figure captions accordingly.

      (3) The conclusions regarding the spatial relationship between peaks and functional regions shall be revised (Lines 187-188, 228-229, and 329-330). In the macaque, the results are opposite in the two atlases used. Further, in the human, it is not clear how multiple comparison corrections will impact statistics and some atlases show opposite results, although conclusions hold true in the majority of human atlases.

      Answers: We thank the reviewer very much for this suggestion. We have added the results of the Cole-Anticevic atlas for macaques in the main text, which also has the observation that shared>unique (Author response table 2, corresponds to Table 5 in main text), namely, there are more diverse brain regions around shared peaks than around unique peaks. Therefore, out of the commonly used three macaque atlases, two (Markov91 and Cole-Anticevic) conform to this observation, while BA05 does not. We utilized false discovery rate (FDR) correction for multiple comparisons, and the corrected p-values are reported in Tables (in the revised main text and are shown below). Results on atlas with multiple resolutions are reported in Author response table 4) (Table A6 in the Supplementary Information). The observation that more diverse brain regions around shared peaks than around unique peaks, holds for human atlases in Author response table 3) (Table 4 in main text), where the atlas resolutions ranges from 7 parcels to 300 parcels, demonstrating the robustness of the conclusion. It is noted that the observation is not consistent on atlases with relatively lower resolutions (e.g., BA05 for macaque, n=30 and Yeo2011 for human, n=7) or, in particular, higher resolutions (e.g., Schaefer-500, and Vosdewael-400, n>300). This inconsistency could be reasonable since the resolution of the parcellation itself will largely determines the chance of a cortical region appear in a peak’s neighborhood, if the parcellation is too coarse or too fine. For example, if n=1 (the entire cortex is the only one region) or n=30k (each vertex is a region), each peak will has the same number of neighboring regions for these two extreme cases (one brain region for each peak for n=1; around 30 vertices for each peak for n=30k).

      In conclusion, we observed that there are more diverse brain regions around shared peaks than around unique peaks for multiple brain atlases with a median parcellation resolution. These results have been added to Tables 4, 5, and A6 along with corresponding text and discussion section.

      Author response table 2.

      The mean values (±SD) of brain regions that appeared within a 3-ring neighborhood for shared and unique peaks in 3 common macaque atlases. For both Markov91 and Cole-Anticevic atlas, the shared peaks has more variety of functional regions around it than the unique peaks. But for the altas BA05, the conclusion was reversed. The bold font represent the larger values between the shared peak and unique peaks. All p<0.001, after false discovery rate (FDR) corrected.

      (4) For Tables 2-4, A4, and Figure 3a, please indicate in all the legends if values correspond to Mean plus minus Standard Deviation, report t-value, and n in the legend or in the text.

      Answers: We thank the reviewer very much for this suggestion. We added the ‘mean (±SD)’ in the notes of Tables 2-4, A4 (now A6), and Figure 3 (a). All the t and n values of t-test are reported in tables or in the main text.

      (5) Please create a statistical section in the Methods to describe more precisely the tests used e.g. for t-tests, if datasets follow a normal distribution with unknown variance. In the case of multiple comparisons like in e.g. Table 2-4, A4, please report what multiple comparisons correction was used to adjust the significance level.

      Author response table 3.

      The mean values (±SD) of brain regions that appeared within a 3-ring neighborhood for shared and unique peaks in 10 common human atlases. All the shared peaks in the table have a greater number of neighboring brain regions compared to the unique peaks. All p<0.001, false discovery rate (FDR) corrected.

      Author response table 4.

      The mean values (±SD) of brain regions where shared and unique peaks appeared within a 3-ring neighborhood in 21 common human atlases. The p-values were corrected by FDR.

      Answers: Thanks for the reviewer’s suggestion, we added a ‘Statistic Analysis’ section in the ‘Materials and Methods’ part:

      ‘All variables used in the two-samples t-test follow a normal distribution check and all p-values were corrected for multiple comparisons using the false discovery rate (FDR) method. Moreover, in order to identify differently expressed genes between shared and unique peaks, we employed the Welch’s t-test, given the unequal sample sizes for shared and unique peaks. For all tests, a p-value <0.05 was considered significant (FDR corrected).’

      For the experiments of multiple comparisons such as Table 2-4, A4 (now A6), etc., we have added explanations in the main text, multiple comparisons correction has been corrected by false discovery rate (FDR), p-value<0.05 is considered significant.

      (6) It would be of great interest to provide the full list of the 28 genes that significantly contributed to the classification of shared and unique peaks. Please provide a description of the Welch’s t-test results. From the 7 genes selected, only two are discussed. Could the authors please describe briefly the function of the other genes? Although we understand that they are not associated with neuronal activity and brain function.

      Answers: We thank the reviewer for these suggestions. We have provided a complete list of 28 genes selected by LASSO in the Author response table 5. Additionally, Welch’s t-test was employed to calculate p-values for the expression differences of each gene in shared and unique peak clusters, and the results are also reported in the Author response table 5.

      Author response table 5.

      The 28 genes selected by LASSO and their corresponding p-values from Welch’s t-test.

      Seven genes showed significant differential expression between shared and unique peaks in Welch’s t-test. These genes were PECAM1, TLR1, SNAP29, DHRS4, BHMT2, PLBD1, KCNH5. Brief descriptions of their functions are listed in Author response table 6. All gene function descriptions were derived from the NCBI website (https://www.ncbi.nlm.nih.gov/).

      These results have been added to Tables 6 and A7 along with corresponding text.

      (6) For comparison, could the authors provide a supplementary figure of shared peak clusters like in Figure 1b but displayed on the surface of the macaque brain template?

      Answers: We thank the reviewer very much for this suggestion and we have incorporated a display of shared peak clusters on the macaque brain template surface (Author response figure 4, corresponds to Figure A1 of Supplementary Information.)

      (7) Could the author develop or rephrase the sentence lines 69-72 which remains unclear?

      Answers: We appreciate the reviewer’s feedback and have revised this sentence to ensure clarity. The sentences from line 69 to 72 have been revised to ‘In the study of macaques, it has been observed that the peak consistently present across individuals is located on more curved gyri (S. Zhang, Chavoshnejad, et al. 2022). Similar conclusions have been drawn in human brain research (S. Zhang, T. Zhang, et al. 2023).’ Now, this sentence corresponds to lines 74-77 in the main text.

      (8) Line 99: please indicate which section.

      Author response table 6.

      Seven genes were selected using LASSO that showed significant differential expression in shared and unique peaks.

      Answers: We thank the reviewer very much for this suggestion and we revised this sentence to ‘The definition of peaks and the method for extracting peak clusters within each species are described in the Materials and Methods section’.

      (9) In Figure 3b, please report R2 and p-value. A semi-log might be more appropriate given the overdispersion of Human Peak Counts.

      Answers: We thank the reviewer very much for this suggestion. Linear regression analysis was conducted on the average counts of all corresponding shared peak clusters of human and macaque. The horizontal and vertical axes of the Author response figure 5 (b) represent the average count of shared peaks in the macaque and human brains, respectively. The Pearson correlation coefficient (PCC) of the interspecies consistency of the left and right brain is 0.20 and 0.26 (p>0.05 for both), respectively. The result of linear regression shows that there is a positive correlation in the inter-individual consistency of shared peaks between macaque and human brains, but it is not statistically significant (with R2 for the left and right brain are 0.07 and 0.01, respectively).

      Author response image 4.

      Shared peak clusters of macaque, shows on macaque brain template.

      The goodness of fit (R2), pearson correlation coefficient (PCC), and their respective p-values were indicated in Author response figure 5 (b). To avoid overdispersion, the peak count of the human brain is displayed in a semi-log format.

      The updated Figure and results are presented in Figure 3 of the main text.

      (10) Line 177: please indicate where in the Supplementary Information.

      Answers: Thank you for the reviewer’s reminder. We have incorporated the results of the human brain structural connectivity matrix into Table A5 in the Supplementary Information and provided corresponding indications in the main text.

      (11) Line 226: please correct ‘(except for betweeness [and efficiency] of the’.

      Answers: We thank the reviewer very much for this suggestion and we added ‘and efficiency’ in original Line 173 and 226 (now Line 206 and 267) after ‘betweeness’.

      (12) The gene expression dataset used is from the Allen Human Brain Atlas (AHBA). Reference to Hawrylycz et al., 2012 Nature. 2012 Sep 20;489(7416):391-399. doi: 10.1038/nature11405 shall be made and abbreviation defined at first use in the text.

      Answers: We added the full name ‘Allen Human Brain Atlas’ when AHBA is first mentioned, along with the reference suggested by the reviewer.

      Author response image 5.

      (a) Mean peak count (±SD) covered by shared and unique peak clusters in two species. ***indicates p<0.001. The t-values for the t-tests in humans and macaques are 4.74 and 2.67, respectively. (b) Linear regression results of the consistency of peak clusters shared between macaque and human brains. The pink and blue colors represent the left and right hemispheres, respectively. The results of the linear regression are depicted in the figure. While there was a positive correlation observed in the consistency of gyral peaks between macaque and human, the obtained p-value for the fitted results exceeded the significance threshold of 0.05.

      (13) Line 17: remove ‘are’.

      Answers: We thank the reviewer very much for this suggestion and we removed ‘are’ in Line 17 (now Line 18).

      (14) Line 201: remove ‘is used’.

      Answers: We thank the reviewer very much for this suggestion and we removed ‘is used’ in Line 201 (now Line 237).

      References

      Buckner, Randy L, Jessica R Andrews-Hanna, and Daniel L Schacter (2008). “The brain’s default network: anatomy, function, and relevance to disease”. In: Annals of the new York Academy of Sciences 1124.1, pp. 1–38.

      Cachia, Arnaud et al. (2018). “How interindividual differences in brain anatomy shape reading accuracy”. In: Brain Structure and Function 223, pp. 701–712.

      Caminiti, Roberto et al. (2010). “Understanding the parietal lobe syndrome from a neurophysiological and evolutionary perspective”. In: European Journal of Neuroscience 31.12, pp. 2320–2340.

      Fornito, Alexander et al. (2004). “Individual differences in anterior cingulate/paracingulate morphology are related to executive functions in healthy males”. In: Cerebral cortex 14.4, pp. 424–431.

      Golesorkhi, Mehrshad et al. (2022). “From temporal to spatial topography: hierarchy of neural dynamics in higher-and lower-order networks shapes their complexity”. In: Cerebral Cortex 32.24, pp. 5637–5653.

      Gordon, Evan M et al. (2023). “A somato-cognitive action network alternates with effector regions in motor cortex”. In: Nature, pp. 1–9.

      Ito, Takuya, Luke J Hearne, and Michael W Cole (2020). “A cortical hierarchy of localized and distributed processes revealed via dissociation of task activations, connectivity changes, and intrinsic timescales”. In: NeuroImage 221, p. 117141.

      Ji, Jie Lisa et al. (2019). “Mapping the human brain’s cortical-subcortical functional network organization”. In: Neuroimage 185, pp. 35–57.

      Llinás, Rodolfo R (2002). I of the vortex: From neurons to self. MIT press.

      Mantini, Dante et al. (2011). “Default mode f brain function in monkeys”. In: Journal of Neuroscience 31.36, pp. 12954–12962.

      Parvizi, Josef et al. (2006). “Neural connections of the posteromedial cortex in the macaque”. In:Proceedings of the National Academy of Sciences 103.5, pp. 1563–1568.

      Vincent, Justin L et al. (2007). “Intrinsic functional architecture in the anaesthetized monkey brain”.In: Nature 447.7140, pp. 83–86.

      Whittle, Sarah et al. (2009). “Variations in cortical folding patterns are related to individual differences in temperament”. In: Psychiatry Research: Neuroimaging 172.1, pp. 68–74.

      Yang, Shimin et al. (2019). “Temporal variability of cortical gyral-sulcal resting state functional activity correlates with fluid intelligence”. In: Frontiers in neural circuits 13, p. 36.

      Zhang, Songyao, Poorya Chavoshnejad, et al. (2022). “Gyral peaks: Novel gyral landmarks in developing macaque brains”. In: Human Brain Mapping 43.15, pp. 4540–4555.

      Zhang, Songyao, Tuo Zhang, et al. (2023). “Gyral peaks and patterns in human brains”. In: Cerebral Cortex.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by Ghafari et al. addresses a question that is highly relevant for the field of attention as it connects structural differences in subcortical regions with oscillatory modulations during attention allocation. Using a combination of magnetoencephalography (MEG) and magnetic resonance imaging (MRI) data in human subjects, inter-individual differences in the lateralization of alpha oscillations are explained by asymmetry of subcortical brain regions. The results are important, and the strength of the evidence is convincing. Yet, clarifying the rationale, reporting the data in full, a more comprehensive analysis, and a more detailed discussion of the implications will strengthen the manuscript further.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors re-analysed the data of a previous study in order to investigate the relation between asymmetries of subcortical brain structures and the hemispheric lateralization of alpha oscillations during visual spatial attention. The visual spatial attention task crossed the factors of target load and distractor salience, which made it possible to also test the specificity of the relation of subcortical asymmetries to lateralized alpha oscillations for specific attentional load conditions. Asymmetry of globus pallidus, caudate nucleus, and thalamus explained inter-individual differences in attentional alpha modulation in the left versus right hemisphere. Multivariate regression analysis revealed that the explanatory potential of these regions' asymmetries varies as a function of target load and distractor salience.

      Strengths:

      The analysis pipeline is straightforward and follows in large parts what the authors have previously used in Mazzetti et al (2019). The authors use an interesting study design, which allows for testing of effects specific to different dimensions of attentional load (target load/distractor salience). The results are largely convincing and in part replicate what has previously been shown. The article is well-written and easy to follow.

      We thank the reviewer for their interest in our study.

      Weaknesses:

      While the article is interesting to read for researchers studying alpha oscillations in spatial attention, I am somewhat sceptical about whether this article is of high interest to a broader readership. Although I read the article with interest, the conceptual advance made here can be considered mostly incremental. As the authors describe, the present study's main advance is that it does not include reward associations (as in previous work) and includes different levels of attentional load. While these design features and the obtained results indeed improve our general understanding of how asymmetries of subcortical structures relate to lateralized alpha oscillations, the conceptual advance is somewhat limited.

      We thank the reviewer for their constructive comment. We’d like to highlight that this is the first study to show relationship between subcortical structures asymmetry with attention-modulated alpha oscillation that did not involve any reward-associations- which is the most studied role of basal ganglia. We also believe there is value is having a second study linking the asymmetry in volume of subcortical structures to the modulation of alpha oscillations as this surprising finding also have important clinical implications (see below). We edited the manuscript as below to explain the advances made in this study:

      Introduction (Line 112): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 301): “It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46).”

      While the analysis of the relation of individual subcortical structures to alpha lateralization in different attentional load conditions is interesting, I am not convinced that the present analysis is suited to draw strong conclusions about the subcortical regions' specificity. For example, the Thalamus (Fig. 5) shows a significant negative beta estimate only in one condition (low-load target, non-salient distractor) but not in the other conditions. However, the actual specificity of the relation of thalamus asymmetry to lateralized alpha oscillations would require that the beta estimate for this one condition is significantly higher than the beta estimates for the other three conditions, which has not been tested as far as I understand.

      We thank the reviewer for this constructive comment. We agree with the reviewer that we should compare the beta value amongst the conditions. We therefore determined to better harness the multivariate nature of our analysis. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and found the following which we have added to the manuscript:

      Results (Line 250): “To ascertain whether each predictor contributes to all conditions, we conducted statistical tests on the results of our MMR using the null hypothesis that a given regressor does not impact all dependent variables. We found that while, with marginal significancy, caudate nucleus can predict variability across all four of the task conditions (F(26,4) = 2.82, p-value = 0.046), the predictive relationships of thalamus (F(26,4) = 2.43, p-value = 0.073) with condition 1, and globus pallidus (F(26,4) = 2.29, p-value = 0.087) with conditions 2 and 3 hold only for these conditions. In sum, this demonstrates that when the task is easiest (condition 1), the thalamus is related to alpha modulation. When the task is most difficult (condition 4), the caudate nucleus relates to the alpha modulation, however, its contributions are substantial enough to predict outcomes across all conditions. For the conditions with medium difficulty (conditions 2 and 3) the globus pallidus is related to the alpha band modulation. “

      Method (Line 599): “To examine the specificity of each regressor for lateralized alpha in each condition, we statistically assessed the results of the MMR against the null hypothesis that a particular predictor does not contribute to all dependent variables, employing a MANOVA test in RStudio (version 2022.02.2) (80).”

      Discussion (Line 337): “Thalamus, Globus Pallidus, and Caudate nucleus play varying roles across different load conditions.”

      Discussion (Line 361): “Although these findings highlight the varying contributions of different regions, they do not imply a lack of evidence for correlations between these subcortical structures and other load conditions.”

      Discussion (Line 379): “Additionally, we refrained from directly comparing the contributions of subcortical structures to different conditions due to low statistical power. […] In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments.”

      Reviewer #3 (Public Review):

      Summary:

      In this study, Ghafari et al. explored the correlation between hemispheric asymmetry in the volume of various subcortical regions and lateralization of posterior alpha-band oscillations in a spatial attention task with varying cognitive demands. To this end, they combined structural MRI and task MEG to investigate the relationship between hemispheric differences in the volume of basal ganglia, thalamus, hippocampus, and amygdala and hemisphere-specific modulation of alpha-band power. The authors report that differences in the thalamus, caudate nucleus, and globus pallidus volume are linked to the attention-related changes in alpha band oscillations with differential correlations for different regions in different conditions of the design (depending on the salience of the distractor and/or the target).

      Strengths:

      The manuscript contributes to filling an important gap in current research on attention allocation which commonly focuses exclusively on cortical structures. Because it is not possible to reliably measure subcortical activity with non-invasive electrophysiological methods, they correlate volumetric measurements of the relevant subcortical regions with cortical measurements of alpha band power. Specifically, they build on their own previous finding showing a correlation between hemispheric asymmetry of basal ganglia volumes and alpha lateralization by assessing a task without an explicit reward component. Furthermore, the authors use differences in saliency and perceptual load to disentangle the individual contributions of the subcortical regions.

      We appreciate the reviewer’s interest in our study.

      Weaknesses:

      The theoretical bases of several aspects of the design and analyses remain unclear. Specifically, we missed statements in the introduction about why it is reasonable, from a theoretical perspective, to expect:

      (i) a link between volumetric measurements and task activity;

      We thank the reviewer for this constructive feedback. We have now addressed this concern in the revised manuscript.

      Discussion (Line 293): “It has been demonstrated that extensive navigation experience enlarges the size of right hippocampus (40). Furthermore, in terms of neurological disorders, it is well established that shrinkage (atrophy) in specific regions is a predictor of a number of neurological and psychiatric conditions including Parkinson’s disease, dementia, and Huntington’s disease. […] It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increase relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46). “

      (ii) a specific link with hemispheric asymmetry in subcortical structures (While focusing on hemispheric lateralization might circumvent the problem of differences in head size, it would be better to justify this focus theoretically, which requires for example a short review of evidence showing ipsilateral vs contralateral connections between the relevant subcortical and cortical structures);

      We thank the reviewer for this helpful comment that resulted in clarification of the manuscript. We addressed this issue in the revised manuscript; we also now have complemented the revised manuscript with papers directly investigating asymmetry of subcortical regions in relation to neurological disorders:

      Introduction (Line 102): “We utilized the hemispheric laterality of subcortical structures and alpha modulation to overcome issues related to individual variations in oscillatory power and head size.”

      Discussion (Line 314): “Employing hemispheric lateralization was motivated by the organizational characteristic of structural asymmetry in healthy brain (47). Additionally, considering the effects of aging (48) and neurodegenerative disorders, such as Alzheimer's Disease (49), on brain symmetry influenced this approach. Furthermore, computing lateralization indices for individuals addresses the challenge of accommodating variations in both head size and the power of oscillatory activity.”

      Discussion (Line 374): “Furthermore, in this study, our emphasis has been on assessing the size of subcortical structures. Future investigations could explore subcortical white matter connectivities and hemispheric asymmetries. This approach has previously been conducted on superior longitudinal fasciculus (SLF) (61,62) and holds potential for examining cortico-subcortical connectivity in the context of oscillatory asymmetries.”

      (iii) effects not only in basal ganglia and thalamus, but also hippocampus and amygdala (a justification of selection of all ROIs);

      We thank the reviewer for this comment. We assessed the hippocampus and amygdala because they are automatically segmented in the FIRST algorithm. As our analysis showed they did not show a relation to the modulation of alpha oscillations, these regions also provide a useful control for our approach. Therefore, we included all subcortical structures in the model and evaluated their predictive impact. This is now addressed in the revised manuscript.

      Method (Line 477): “FIRST is an automated model-based tool that runs a two-stage affine transformation to MNI152 space, to achieve a robust pre-alignment of thalamus, caudate nucleus, putamen, globus pallidus, hippocampus, amygdala, and nucleus accumbens based on individual’s T1-weighted MR images.”

      Method (Line 576): “The absence of a relationship between modulations of alpha oscillations and the hippocampus and amygdala was expected as these regions typically are not associated with the allocation of spatial attention and thus add validity to our approach. “

      (iv) effects that depend on distractor versus target salience (a rationale for the specific two-factor design is missing);

      We thank the reviewer for this comment that helped us clarify the manuscript. The two-factor design is to investigate how allocation of attentional resources specifically relates to mechanisms of excitability and suppression mechanism. For this reason, both the salience of the distractor (associated with suppression) and the perceptual load of the target (associated with excitability) had to be manipulated. We clarified the rationale in the revised version as below:

      Introduction (Line 96): “We analyzed MEG and structural data from a previous study (27), in which spatial cues guided participants to covertly attend to one stimulus (target) and ignore the other (distractor). To investigate the relationship between the allocation of attentional resources and mechanisms of neural excitability and suppression, the target load and the visual saliency of the distractor were manipulated using a noise mask. This load/salience manipulation resulted in four conditions that affect the attentional demands of target and distractor.”

      (v) effects in the absence of reward (why it is important to show that the effect seen previously in a task with reward is seen also in a task without reward);

      We thank the reviewer for this clarification comment. We addressed this question in introduction and discussion as below:

      Introduction (Line 107): “By examining their role in a task without explicit reward, we aim to elucidate the generalizability of the contributions of subcortical structures to spatial attention modulation. Such a finding would implicate a role for the basal ganglia in cognition beyond the well-studied realm of the estimation of choice values (33). Specifically, in a prior study (28), we observed that the contributions of the basal ganglia were most pronounced when the items in question were associated with a reward. Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 333): “This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations beyond reward valence and to the context of attention.”

      (vi) effects on rapid frequency tagging.

      We thank the reviewer for this constructive comment. We have now included this analysis and added the results to the revised manuscript.

      Results (Line 224): “It is worth noting that neither the behavioural nor the rapid invisible frequency tagging (RIFT) measures showed significant relationships with LVs and HLM() (Supplementary material, Figure 1 and Table 3).”

      Discussion (Line 396): “We did not find any association between the power of RIFT signal and the size asymmetry of subcortical structures. Since to Bayes factors were less than 0.1, we conclude that our RIFT null findings are robust, suggesting a dissociation between how alpha oscillations and neuronal excitability indexed by RIFT relate to subcortical structures.”

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).”

      Supplementary Materials (Line 839): “Figure 1. Lateralization volume of thalamus, caudate nucleus and globus pallidus in relation to hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) on the right and behavioural asymmetry on the left. A and E, The beta coefficients for the best model (having three regressors) associated with a generalized linear model (GLM) where lateralization volume (LV) values were defined as explanatory variables for HLM(RIFT) (A) and behavioural asymmetry (E). Error bars indicate standard errors of mean (SEM). B and F, Partial regression plot showing the association between LVTh and HLM(RIFT) (B, p-value = 0.59) and behavioural asymmetry (F, p-value = 0.38) while controlling for LVGP and LVCN. C and G, Partial regression plot showing the association between LVGP and HLM(RIFT) (C, p-value = 0.16) and behavioural asymmetry (G, p-value = 0.80) while controlling for LVTh and LVCN . D and H, Partial regression plot showing the association between LVCN and HLM(RIFT) (D, p-value = 0.53) and behavioural asymmetry (H, p-value = 0.74) while controlling for LVTh and LVGP. Negative (or positive) LVs indices denote greater left (or right) volume for a given substructure; similarly negative HLM(RIFT) values indicate stronger modulation of RIFT power in the left compared with the right hemisphere, and vice versa; positive behavioural asymmetry value shows higher accuracy when the target was on the right as compared with left, and vice versa for negative behavioural asymmetry values. The dotted curves in B, C, D, F, G, and H indicate 95% confidence bounds for the regression line fitted on the plot in red.

      Author response image 1.

      Second, the results are not fully reported. The model space and the results from the model comparison are omitted. Behavioral data and rapid frequency tagging results are not shown. Without having access to the data or the results of the analyses, the reader cannot evaluate whether the null effect corresponds to the absence of evidence or (as claimed in the discussion) evidence of absence.

      We thank the reviewer for this constructive suggestion. In the revised manuscript, we incorporated the model space, model comparisons, BIC values from the models, behavioral and rapid frequency tagging analysis methods, and their respective results. Additionally, we computed Bayes factors for our null findings to enhance the interpretability of our results.

      Results (Line 199): “This model predicted the HLM(α) values significantly in the GLM (F3,29 = 7.4824, p = 0.0007, adjusted R2 = 0.376) as compared with an intercept-only null model (Figure 4A).”

      Although, the beta estimate of LVGP only showed a positive trend, removing it from the regression resulted in worse models (AIC and BIC tables in supplementary material).

      Supplementary materials (Line 827): “Table 1. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for all possible combinations of regressors (Lateralized Volume of subcortical structures). The selected model, with lowest AIC, is marked in green.

      Author response table 1.

      Author response table 2.

      Author response table 3.

      Bayes factors for correlation between hemispheric laterality of subcortical structures with hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) and with behavioural asymmetry (BA). The Pearson correlation between each subcortical structure with HLM(RIFT) and behavioural asymmetry was calculated. The likelihood of the data under the alternative hypothesis (the evidence of correlation) were subsequently compared to the likelihood under null hypothesis (absence of correlation), given the data. As it is demonstrated in the table, all Bayes factors were below or very close to 1 indicating evidence for the null hypothesis.

      For the results of frequency tagging signal, we have now included this analysis and added the results to the revised manuscript. We refer the reviewer to our response to the weakness (vi) from reviewer #3.

      Third, it remains unclear whether the MMS is the best approach to analyzing effects as a function of target and distractor salience. To address the question of whether the effects of subcortical volumes on alpha lateralization vary with task demands (which we assume is the primary research question of interest, given the factorial design), we would like to evaluate some sort of omnibus interaction effect, e.g., by having target and distractor saliency interact with the subcortical volume factors to predict alpha lateralization. Without such analyses, the results are very hard to interpret. What are the implications of finding the differential effects of the different volumes for the different task conditions without directly assessing the effect of the task manipulation? Moreover, the report would benefit from a further breakdown of the effects into simple effects on unattended and attended alpha, to evaluate whether effects as a function of distractor (vs target) salience are indeed accompanied by effects on unattended (vs attended) alpha.

      The reviewer is correct that we did not directly compare between task conditions when we assessed the predictive relationship between basal ganglia lateralization and alpha lateralization. We opted for the multivariate regression approach as this allowed us to simultaneously model the predictive relationship between our continuous predictors and HLM alpha in each condition, allowing us to be most efficient with our level of statistical power (N=33). Indeed, directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). This approach would be underpowered given our sample size, and the ensuing results are likely to be unreliable.

      However, we statistically analysed our regression results. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and reported the findings in response to weakness two from reviewer #1.

      Discussion (Line 384): “In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments. “

      The fourth concern is that the discussion section is not quite ready to help the reader appreciate the implications of key aspects of the findings. What are the implications for our understanding of the roles of different subcortical structures in the various psychological component processes of spatial attention? Why does the volumetric asymmetry of different subcortical structures have diametrically opposite effects on alpha lateralization? Instead, the discussion section highlights that the different subcortical structures are connected in circuits: "Globus pallidus also has wide projections to the thalamus and can thereby impact the dorsal attentional networks by modulating prefrontal activities." If this is true, then why does the effect of the GP dissociate from that of the thalamus? Also, what is it about the current behavioural paradigm that makes the behavioral readout insensitive to variation in subcortical volume (or alpha lateralization?)?

      We thank the reviewer for this feedback. These are indeed all good points, and we hope that our findings will inspire further research to address these issues. In the revised manuscript we now write:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained but the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (57).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] . It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      Discussion (Line 388): “Moreover, our failure to identify a relationship between the lateralized volume of subcortical structures and behavioural measures should be addressed in studies that are better designed to capture performance asymmetries (63). Individual preferences toward one hemifield, which were not addressed in the current study design, could potentially strengthen the power to detect correlations between structural variations in the subcortical structures and behavioural measures.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comment:

      Between-subject correlation/regression analyses always rely on the assumption that the underlying dependent measures are reliable. While the reliability of asymmetries of subcortical structures can be assumed, the reliability of lateralized alpha oscillations during spatial attention can be questioned. It would be helpful if the authors could test the reliability of alpha lateralization, for instance by calculating HLM(a) in the first and second half of the experiment and correlating the resulting HLM(a) values (split-half reliability).

      We appreciate the reviewer for their insightful comment. Acknowledging that the between-subject regression relies on the reliability of alpha lateralization. Nonetheless, a previous study has demonstrated consistent results regarding HLM(α). We have further elaborated on these aspects in the discussion section:

      Discussion (Line 328): “Furthermore, our regression analysis outcomes align with the findings of Mazzetti et al. (28) underscoring the significant predictive influence exerted by the lateralized volume of globus pallidus on the modulation of hemispheric lateralization in alpha oscillations during spatial attention tasks. This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations within the context of attention.”

      Reviewer #3 (Recommendations For The Authors):

      We recommend that a revised version of the manuscript

      • Clarifies the theoretical basis for the 6 key design & analysis choices that we have outlined above;

      We thank the reviewer for their precision. We addressed the concerns outlined above in the previous section.

      • Also clarifies the task description (perhaps referring to target and distractor salience instead of target load versus distractor salience might help);

      Thank you for this constructive comment. We used the terms ‘load’ for target and ‘salience’ for distractor because the noise manipulation of the faces reduces the salience of the image which results in distractors being less distractive (easier) but targets being more perceptually loaded (harder). The explanation of these terms is made clear in the revised manuscript.

      Method (Line 447): “Over trials, the perceptual load of targets was manipulated using a noise mask; noisy targets are harder to detect than clear targets and therefore incur greater perceptual load in their detection. The saliency of distractor stimuli was also manipulated using a noise mask; noisy distractor stimuli are less salient than clear distractors and therefore less disruptive to performance on the detection task. The noise mask was created by randomly swapping 50% of the stimulus pixels (Figure 1B). This manipulation resulted in four target-load/distractor-saliency conditions: (1) target: low load, distractor: low saliency (i.e., clear target, noisy distractor), (2) target: high load, distractor: low saliency (i.e., noisy target, noisy distractor), (3) target: low load, distractor: high saliency (i.e., clear target, clear distractor), (4) target: high load, distractor: high saliency (i.e., noisy target, clear distractor) (Figure 1B and C).”

      • Fully reports all the data, including those of the model comparisons, the behavioural results, and the rapid frequency tagging results;

      We thank the reviewer for this constructive comment. We refer the reviewer to our response to second comment and comment (vi) from reviewer #3.

      • Reports interaction effects to directly test the modulating role of task demands in the link between volume and alpha, and break down the alpha lateralization indices into their simple effects on the ipsilateral and contralateral hemispheres;

      task demands have been addressed in response to in response to weakness two from reviewer #1.

      Regarding the second part of the comment, in our study, to compare the lateralized modulation of alpha oscillations between the right and left hemispheres, we computed hemispheric lateralization modulation. This involved dividing trials into attention right and attention left. Subsequently, we calculated the lateralization index separately for sensors on the right and left. Specifically, this entailed computing ipsilateral – contralateral for sensors on the right and contralateral – ipsilateral for sensors on the left side of the brain. We addressed this concern in methods section as below:

      Method (Line 537): “As MI(α) consistently represents power of alpha in attention right versus attention left conditions, it entails the comparison between ipsilateral and contralateral alpha modulation power for sensors located on the right side of the head. The same comparison applies inversely for sensors situated on the left side of the brain.”

      • Clarifies in the discussion section the specific implications of the results for our understanding of the link between distinct subcortical structures and distinct component processes of spatial attention.

      We thank the reviewer for their constructive comment. This point is addressed in response to the fourth concern of reviewer #3.

      More detailed specific recommendations are provided below:

      • Line 40ff: In this paragraph, the theoretical framework concerning the function of the subcortical regions of interest is described. Here, the authors jump back and forth between the role of the basal ganglia and the role of the thalamus. For clarity, we would advise to describe the functions of these two structures one after the other. And include a justification for assessing the hippocampus and the amygdala.

      We appreciate the reviewer’s preciseness in this comment. We put the description of these structures one after the other in the revised manuscript as below:

      Introduction (Line 44): “For instance, it has been shown that the pulvinar plays an important role in the modulation of neocortical alpha oscillations associated with the allocation of attention (9). Studies in rats and non-human primates have shown that both the thalamus and superior colliculus, are involved in the control of spatial attention by contributing to the regulation of neocortical activity (9-11). Notably, when the largest nucleus of the thalamus, the pulvinar, was inactivated after muscimol infusion, the monkey’s ability to detect colour changes in attended stimuli was lowered. This behavioral deficit occurred when the target was in the receptive field of V4 neurons that were connected to lesioned pulvinar (12). The basal ganglia play a role in different aspects of cognitive control, encompassing attention (13,14), behavioural output (15), and conscious perception (16). Moreover, the basal ganglia contribute to visuospatial attention by linking with cortical regions like the prefrontal cortex via the thalamus.”

      Justification for assessing the hippocampus and the amygdala has been addressed in response to weakness (iii) from reviewer #3.

      • The authors mention they defined symmetric clusters of 5 sensors in each hemisphere that showed the highest modulation, but it is not clear how this number of sensors was determined a priori.

      We thank the reviewer for their comment. We edited the revised manuscript as below:

      Method (Line 536): “Ten sensors were selected to ensure sufficient coverage of the region exhibiting alpha modulation as judged from prior work (62).”

      • In line 141, the abbreviation HLM is first mentioned but the concept of "hemispheric lateralization modulation of alpha power" is only mentioned in the following section. For the ease of the reader, the abbreviation could be mentioned together with this concept at the beginning of this paragraph.

      We thank the reviewer for the attention. In the revised manuscript HLM() is now mentioned with its concept.

      Results (Line 153): “Next, we computed the hemispheric lateralization modulation of alpha power (HLM()) in each individual.”

      • In line 188 of the results section, it is mentioned that the table including the AIC values for model comparisons is in the supplementary material, however, we could not locate this table.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the supplementary materials to the end of the manuscript for convenience.

      • Figure 4 is missing the panel headers A, B, C, and D.

      We thank the reviewer for their precision. This figure is now fixed.

      Author response image 2.

      • In lines 205 and 206, behavioral and rapid frequency tagging analysis are mentioned. For the behavioral analysis, the method is described, but no results are provided. For the rapid frequency tagging, neither the methods nor the results are described. To evaluate the strength of this (non)-evidence, we would advise to elaborate on these analysis steps and report the results in the supplementary material.

      We thank the reviewer for this constructive comment. A brief explanation of the analysis method of rapid frequency tagging signal is added to the revised manuscript.

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).” For a more detailed answer, we refer the reviewer to the second comment from reviewer #3.

      • For the paragraph starting at line 209, we would recommend referring to Figure 1.

      We thank the reviewer for their suggestion. This paragraph is now referring to Figure 1.

      Results (Line 229): “To relate load and salience conditions of the task to the relationship between subcortical structures and the alpha activity, we combined low-load or high-load targets with high-saliency or low-saliency distractors to manipulate the perceptual load appointed to each trial (Method section, Figure 1). “

      • Figure 5 as well as the report of the beta weights in this section shows a difference in the direction of the effect for the thalamus compared to the globus pallidus and caudate nucleus which is not discussed in this section.

      We thank the reviewer for bringing this important point to our attention. We addressed this comment in the discussion section as below:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained by the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (54).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      • Comment 2 on line 80 is addressed in the paragraph following 264 by describing volumetric changes in basal ganglia in neurodegenerative disorders such as PD or Huntington's. Still, the link of how a decrease in volume in this region could be causally linked to changes in alpha-band power could be better supported.

      We thank the reviewer for their constructive feedback. We are here highlighting the significant correlation between subcortical structures and changes in attention modulated alpha oscillation. We added a few more references to the discussion supporting the relationship between size and function in relation to neurological disorders. We also edited the manuscript to make this point clearer as below:

      Introduction (Line 113): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, independent of any reward or value associations. “

      Discussion (Line 305): “Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (42). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (43). “

      • Related to the previous comment on behavioral and rapid frequency tagging results, these are difficult to evaluate without mention of the methods and/or results.

      We thank the reviewer for this comment. We refer the reviewer to our response to the second comment from reviewer #3.

      • The authors show differential effects of target load and distractor saliency; however, we missed the description of how these two variables differ conceptually as they are both described as contributing to task difficulty and it is not described why we would expect differential effects for these concepts (or in other words, how the authors explain the differential effects).

      We thank the reviewer for their comment. Directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). Give our sample size, this study is underpowered to directly compare alpha lateralisation in contralateral versus ipsilateral conditions. For a more detailed answer please refer to our response to weakness two from reviewer #1.

      • Line 364ff: Based on the description of the experimental design, it is not clear to us whether participants only had to report on the change in gaze for the stimulus in the cued hemifield.

      We thank the reviewer for this comment, which prompted us to clarify the experimental design as below:

      Method (Line 440): “Then followed a 1000 ms response interval where participants were asked to respond with their right or left index finger whether the gaze direction of the cued face shifted left or right.”

      • Line 47ff: As mentioned above, the AIC table is not included. Further, as it is mentioned that BIC values led to similar results (indicating that they are not identical), it would be valuable to report both AIC and BIC values.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the BIC values and attached the supplementary materials to the end of the manuscript for convenience.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      Songbirds provide a tractable system to examine neural mechanisms of sequence generation and variability. In past work, the projection from LMAN to RA (output of the anterior forebrain pathway) was shown to be critical for driving vocal variability during babbling, learning, and adulthood. LMAN is immediately adjacent to MMAN, which projects to HVC. MMAN is less well understood but, anatomically, appears to resemble LMAN in that it is the cortical output of a BG-thalamocortical loop. Because it projects to HVC, a major sequence generator for both syllable phonology and sequence, a strong prediction would be that MMAN drives sequence variability in the same way that LMAN drives phonological variability. This hypothesis predicts that MMAN lesions in a Bengalese finch would reduce sequence variability. Here, the authors test this hypothesis. They provide a surprising and important result that is well motivated and well analyzed: MMAN lesions increase sequence variability - this is exactly the opposite result from what would be predicted based on the functions of LMAN.

      Strengths:

      (1) A very important and surprising result shows that lesions of a frontal projection from MMAN to HVC, a sequence generator for birdsong, increase syntactical variability.

      (2) The choice of Bengalese finches, which have complex transition structures, to examine the mechanisms of sequence generation, enabled this important discovery.

      (3) The idea that frontal outputs of BG-cortical loops can generate vocal variability comes from lesions/inactivations of a parallel pathway from LMAN to RA. The difference between MMAN and LMAN functions is striking and important.

      Weaknesses:

      (1) If more attention was paid to how syllable phonology was (or was not) affected by MMAN lesions then the claims could be stronger around the specific effects on sequence.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates the neural substrates of syntax variation in Bengalese finch songs. Here, the authors tested the effects of bilateral lesions of mMAN, a brain area with inputs to HVC, a premotor area required for song production. Lesions in mMAN induce variability in syntactic elements of song specifically through increased transition entropy, variability within stereotyped song elements known as chunks, and increases in the repeat number of individual syllables. These results suggest that mMAN projections to HVC contribute to multiple aspects of song syntax in the Bengalese finch. Overall the experiments are well-designed, the analysis excellent, and the results are of high interest.

      Strengths:

      The study identifies a novel role for mMAN, the medial magnocellular nucleus of the anterior nidopallium, in the control of syntactic variation within adult Bengalese finch song. This is of particular interest as multiple studies previously demonstrated that mMAN lesions do not affect song structure in zebra finches. The study undertakes a thorough analysis to characterise specific aspects of variability within the song of lesioned animals. The conclusions are well supported by the data.

      Weaknesses:

      The study would benefit from additional mechanistic information. A more fine-grained or reversible manipulation, such as brain cooling, might allow additional insights into how mMAN influences specific aspects of syntax structure. Are repeat number increases and transition entropy resulting from shared mechanisms within mMAN, or perhaps arising from differential output to downstream pathways (i.e. projections to HVC)? Similarly, unilateral manipulations would allow the authors to further test the hypothesis that mMAN is involved in inter-hemispheric synchronization.

      We thank the reviewers and editor for their encouraging and helpful comments and suggestions. We have revised the previous submission with new analyses and discussion to address points raised by the reviewers.

      Following the suggestion of Reviewer 1 we have added an analysis of the effects of mMAN lesions on syllable phonology, using a variety of measures. We have included 3 new Figure Supplements that detail our analyses and elaborate on these points.

      We agree with Reviewer 2 that reversible and unilateral manipulations would be interesting and potentially enable additional insights into the mechanisms by which mMAN influences song sequencing, and we are planning to perform such experiments in future studies.

      We made additional minor changes throughout the manuscript to address other points raised by the reviewers, and we thank them again for their time and effort in providing constructive feedback to improve our study.

      A complete point by point detailing of these changes is included below, interspersed with the reviewer comments.

      Reviewer #1 (Recommenda1ons For The Authors):

      The opposite result from what would be predicted based on the functions of LMAN.

      Shoring up the paper's claims and ruling out alternative interpretations will require attention to the following issues:

      Major comments

      (1) Acoustic structure of syllables

      Line 294 & Sup. Figure 2, in some birds the syllable acoustic structures seem to be significantly different between the pre- and post-lesion condition, e.g. 'w' in Bird 1, 'g' in Bird 2, 'blm' in Bird 6. This observation seems to contradict the claim that acoustic structures are not affected by MMAN lesions.

      Related to the previous point, a more detailed analysis is needed to quantify the extent of acoustic changes caused by MMAN lesions. For example, do these pre- and post- lesion syllables form distinct clusters if embedded in a UMAP? Do more standard measures of syllable phonology (e.g. SAP similarity scores or feature distributions) show differences in pre- and post-MMAN lesion?

      We agree with the reviewer that there were individual syllables as illustrated in the average spectrograms of Figure 2 – figure supplement 1 that qualitatively differed between pre- and post-lesion recordings. We have followed the reviewer’s suggestion to quantify changes to syllable phonology using both similarity scores by Sound Analysis Pro (SAP) and a variety of identified acoustic features.

      In brief, these measures largely corroborate the conclusion that for most birds and syllables there was little or no difference in phonology between pre- and post-lesion songs, but that in a minority of cases syllables were altered noticeably (further detail below). In those cases where syllable phonology was altered, changes were not consistent across birds, and we cannot rule out off-target effects due to damage to structures or fibers of passage neighboring mMAN, so that it is unclear whether some subtle changes to syllable phonology can be attributed to mMAN lesions versus other causes. Future studies could more specifically examine whether damage to mMAN alone is sufficient in some cases to degrade syllable structure by using viral or other approaches that enable the more specific disruption of mMAN projection neurons.

      In practice, almost all syllables were identifiable in post-lesion songs so that we could unambiguously assign identity for purposes of evaluating effects of lesions on sequencing. Moreover, in any individual cases where there was ambiguity in syllable identity, we used the sequential context to assign the most likely label. Thus, any errors in assignment in such cases would have tended to reduce rather than accentuate the magnitude of reported sequencing effects. Lastly, each of the reported effects of mMAN lesions on sequencing were observed in multiple birds for which we detected no significant changes to syllable similarity.

      Further details of the analyses of syllable structure are detailed below, and have been added as new figure supplements:

      (1) Syllable similarity scores calculated using SAP (Sound Analysis Pro) (new Figure 2 – figure supplement 2). We compared pre-post lesion similarity scores for each syllable with selfsimilarity measures for the same syllables taken from separate control recordings before lesions. For comparison, we also included a cross-similarity score for syllables of different types. These measures confirmed the qualitative impression from spectrograms that for most birds there were no greater changes to syllable structure following lesions than was present across control recordings. For one bird, pre-post changes were significantly larger than changes across control recordings, but pre-post similarity remained higher than crosssimilarity.

      (2) Analysis of fundamental frequency and coefficient of variation (CV) of fundamental frequency of select syllables for each bird before and after mMAN lesions (new Figure 2- figure supplement 3). This analysis is directly comparable with the same analysis performed on LMAN lesions in Sakata, Hampton, Brainard (2008). We carried out this analysis in part to address changes to syllable structure that might have inadvertently arisen due to damage to LMAN, which sits immediately lateral to mMAN. In the Bengalese finch and zebra finch, lesions of LMAN cause little change to the mean fundamental frequency of individual syllables but cause a consistent reduction in the coefficient of variation (CV) of fundamental frequency across repeated renditions of a given syllable (Sakata, Hampton, Brainard 2008, Andalman, Fee 2009, Warren et al. 2011,). We therefore supposed that unintended damage to LMAN or its projections to RA might have resulted in a reduction in the CV of syllables following mMAN lesions. Instead, we saw a modest increase in the CV of fundamental frequency (mean across birds of +20%; range -19 to +43%). These data suggest that off target effects on LMAN were largely absent in our experiments (consistent with histology, e.g. Figure 1 - figure supplement 1).

      (3) Comparison of Entropy of spectral envelope (entS), Temporal centroid for the temporal envelope (meanT), First, second and third formants (F1, F2, F3), before and after lesions (calculated using the python SoundSig toolbox (Elie and Theunissen 2016) (new Figure 2- figure supplement 4). Acoustic features generally showed little change between pre and post lesion songs. They highlight as relative outliers the same individual examples that stand out in the average spectrograms in Figure 2 – figure supplement 1.

      Author response image 1.

      Syllable similarity calculated using Sound Analysis Pro (SAP). ‘Self Similarity’ = Similarity comparison of syllables before mMAN lesions to syllables of the same type, taken from two separate control recordings before the lesions, ‘Pre vs Post’ = Similarity comparison of the same syllable types before and aqer mMAN lesions, ‘Cross Similarity’ = Similarity comparison of each syllable type to other syllable types. For Birds 1-2 and 4-7, ‘Self Similarity’ was not significantly different from ‘Pre vs Post’ Similarity (p>0.05, Wilcoxon sign rank test), while for Bird 3, there was a significant difference (p = 0.03, Wilcoxon sign rank test). For all birds ‘Pre vs Post’ was significantly different from ‘Cross Similarity’ (p<0.05, Wilcoxon sign rank test). On average, ‘Pre vs Post’ was 4.8 % less than ‘Self Similarity’ (range 0.2%-14%) while ‘Cross Similarity’ was 40% less than ‘Self Similarity’ (range 20.2%-56.3%). These measures confirm the qualitative impression from Figure 2- figure supplement 1 that for most birds and syllables there were no greater changes to syllable structure following lesions than was present across control recordings, and that pre-post similarity remained higher than cross-similarity, i.e. syllables remained clearly identifiable.

      Author response image 2.

      (A) CV of fundamental frequency (FF) of select syllables before and aqer mMAN lesions. In the Bengalese finch and zebra finch, lesions of lMAN, which sits immediately lateral to mMAN, cause a consistent reduction in the coefficient of variation (CV) of fundamental frequency across repeated renditions of a given syllable (Sakata, Hampton, Brainard 2008, Andalman, Fee 2009, Warren et al. 2011). We therefore supposed that unintended damage to lMAN or its projections to RA might have resulted in a reduction in the CV of syllables following mMAN lesions. Instead we saw a modest increase in the CV of fundamental frequency (p<0.05, Wilcoxon sign rank test; mean across birds of +20%; range -19 to +43%). These data suggest that it is unlikely that changes to syllable structure might have arisen due to accidental damage to lMAN. (B) Percent change in mean fundamental frequency aqer mMAN lesions vs mean fundamental frequency before mMAN lesions.

      Author response image 3.

      Selected acoustic features for all syllables in all birds before and after mMAN lesions. Different colors represent different syllable types per bird. ‘entS’ = Entropy of spectral envelope, ‘meanT’ = Temporal centroid for temporal envelope, ‘F1’ = First formant, ‘F2’= Second formant, ‘F3’ = Third formant. Acoustic features generally showed little change between pre and post lesion songs. They highlight as relative outliers the same individual examples that stand out in the average spectrograms in Figure 2 – figure supplement 1.

      (2) Shoring up claims of increased transitional variability

      Line 301 & Sup. Figure 1, in several birds (1, 2, 5, 6), seems that there is a downward trend for postlesion, i.e. the transition entropy gradually decreases with time. How to exclude the possibility that the increased variability is a transient effect, e.g. caused by surgery side effects or destabilization of circuits, which may eventually recover to normal?

      Transition entropy remains elevated for as long as the birds were followed in this study. While the persistence of the effects we observed is longer than transient effects such as those following Nif lesion in zebra finches (Otchy et al., 2015 ~2 days), we cannot rule out either recovery or further deterioration following lesions on much longer time scales, such as those reported by Kubikova et al., 2007 (X lesions, 6 months). We have now added data points for 4 birds where we had songs from later timepoints following lesions; for three of these birds, transition entropy remained elevated above the baseline values for 14 and 33 days, respectively (Figure 1 - figure supplement 2).

      Line 313 & Sup. Figure 4, the claim that "transitions that had low history dependence tended to show larger changes after mMAN lesions" needs better statistical support, because in Sup. Figure 4, the correlation is not significant.

      We apologize for the phrasing. We have changed the sentence to: “Consistent with the first possibility, we observed that there was a nonsignificant trend toward larger changes after mMAN lesion for transitions with low history dependence.”

      Figure 4C-D, only data from 5 out of 7 birds was included, did the other two birds not have repeats? If so, the authors need to be explicit on data exclusion.

      The reviewer’s inference is correct that in our dataset only 5 out of 7 birds had songs which contained repeat phrases. We have added the following sentence to state that explicitly: “In our dataset of 7 birds, only 5 birds had songs which contained repeat phrases.”

      Minor comments

      Sup. Figure 3, to help readers understand, 1) add symbols and arrows to point to the structures; 2) indicate the orientation of the slide, e.g. which direction is medial/lateral; 3) a negative control without lesion needs to be shown for comparison.

      We have made the suggested changes and updated new Figure 1- figure supplement 1.

      Author response image 4.

      Image of calcitonin gene-related peptide (CGRP)-stained frontal section (leq) control and (right) bird 5. CGRP labels cells in both lMAN (seen in black to the leq of the lesion) and mMAN (blue, intact; red, completely destroyed).

      A statistical test is needed for Sup. Figure 5B.

      We have modified the Figure legend for Figure 3 – figure supplement 1 as follows:

      “Change in transition entropy was not significantly different for transitions within chunks and at branchpoints (p> 0.05, Wilcoxon rank sum test)”

      Line 363, these can be moved to the Introduction, so readers have a better sense of what's already known about MMAN lesion.

      We have moved the sentence to Introduction.

      Fig 1e. RA also projects to DLM.

      Our intention was to focus on the connections involving mMAN; we have now added the connection in Figure 1E.

      Reviewer #2 (Recommenda1ons For The Authors):

      Please address this issue in the discussion (no new experiments required): It would be interesting to consider how social context modulates the variability of the song. In these experiments, Bengalese finches were singing in isolation. How might changes in syntax be modulated by the presence of a female in directed song and in other social contexts?

      Thank you for your suggestion. One study by Jarvis, et al., (Jarvis E., et al., 1998) shows that ZENK expression in mMAN aqer singing does not differ between female-directed singing, undirected singing and singing in presence of a male conspecific. This suggests that activity in mMAN might not be modulated by social context. But we agree that it would be interesting to test how a change in social context (which typically leads to reduced transition entropy) interacts with the increased variability we see aqer mMAN lesions. We have added the following sentences to the discussion:

      “In our study, we only recorded song sequencing of male Bengalese finches singing in isolaBon. Social context, such as female-directed song, can also change song sequencing (Hampton, Sakata and Brainard, 2009; Chen, Matheson and Sakata, 2016). It would be interesBng to test whether mMAN plays a role in the social context-modulated changes in sequencing (Jarvis et al., 1998), similar to how lMAN contributes to social context-modulated changes in syllable structure (Sakata, Hampton and Brainard, 2008).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review

      [...] A particular strength of the present study is the structural characterization of human PURA, which is a challenging target for structural biology approaches. The molecular dynamics simulations are state-of-the-art, allowing a statistically meaningful assessment of the differences between wild-type and mutant proteins. The functional consequences of PURA mutations at the cellular level are fascinating, particularly the differential compartmentalization of wild-type and mutant PURA variants into certain subcellular condensates.

      Weaknesses that warrant rectification relate to (i) The interpretation of statistically non-significant effects seen in the molecular dynamic simulations.

      We removed from the manuscript the sentence which indicated that we analyzed statistically non-significant effects. Therefore, the above statement has been resolved.

      (ii) The statistical analysis of the differential compartmentalization of PURA variants into processing bodies vs. stress granules, and

      We re-analyzed all cell-biological data and adjusted the statistical analysis of P-bodies and Stress-granule intensity analysis. The new, and improved statistics have replaced the original analyses in the corresponding figures (Figs. 1C and 2B).

      (iii) Insufficient documentation of protein expression levels and knock-down efficiencies.

      Quantification of protein expression levels by Western blotting is shown in Appendix Figure S1. Quantification of knock-down efficiencies by Western blot experiments (Appendix Figure S3).

      Recommendations for the authors: Reviewer #1

      Concerns and Suggested Changes

      (a) I have only one concern about the computational part and that is about statements such as "There are also large differences in the residue surrounding the mutation spot (residues 90 to 100), where the K97E mutant also shows much greater fluctuation. However, these differences are not significant due to the large standard deviations." If the differences are not statistically significant, then I would suggest either removing such a statement or increasing the statistics.

      We agree with the Reviewer’s comment. We removed this sentence from the text.

      Recommendations for the authors: Reviewer #2

      General Comments

      This is a challenging structural target and the authors have made considerable efforts to determine the effect of several mutations on the structure and function. Many of the constructs, however, could not be expressed and/or purified in bacteria. However, it is not clear to what extent other expression systems (e.g. Drosophila or human) were considered and if this would have been beneficial.

      We did not use other expression systems because the wild-type protein is well-behaved when expressed in E. coli. In case a mutant variant cannot be expressed or does not behave well in E. coli, this constitutes a clear indication that the respective mutation impairs the protein’s integrity. Thus, by using E. coli as a reference system for all the variants of PURA protein, we could assess the influence of the mutations on the structural integrity and solubility. Only for the variants that did not show impairment in E. coli expression, we continued to assess in more detail why they are nevertheless functionally impaired and cause PURA Syndrome.

      Concerns and Suggested Changes

      (a) The schematic in Figure 3A would have been helpful for interpreting the mutations discussed in Figures 1 and 2. I would suggest moving it earlier in the text.

      We changed the figure according to the Reviewer’s suggestion.

      (b) I believe the RNA used for binding studies in Figures 3C and D was (CGG)8. Are the two "free" RNA bands a monomer and a dimer (duplex?)?

      Although we do not know for certain, it is indeed likely that the two free RNA bands represent either different secondary structures of the free RNA or a duplex of two molecules. Of note, PURA binds to both “free” RNA bands, indicating that it either does not discriminate between them or melts double-stranded RNA in these EMSAs.

      There also seems to be considerable cooperativity in the binding, so I wonder if a shorter RNA oligonucleotide might facilitate the measurement of Kds.

      The length of the used RNA was selected based on the estimated elongated size of the full-length PURA and the presence of 3 PUR repeats. Assuming that one PUR repeat interacts with about 6-7 bases (data from the co-structure of Drosophila PURA with DNA; PDB-ID: 5FGP) and that full-length PURA forms a dimer consisting of three PUR repeats, the full-length protein in its extended form should cover a nucleic-acid stretch of about 24 bases.

      Also, it is not clear how the affinities were measured particularly for hsPURA III since free band is never fully bound at the highest protein concentration.

      It was not our goal to measure Kds for the interaction of PURA variants with RNA. The EMSA experiments were conducted to detect relative differences in the interaction between PURA variants and RNA. To estimate the differences, we measured total intensity of the bound (shifted) and unbound RNA. The intensities of the bands observed on the scanned EMSA gels were quantified with FUJI ImageJ software. We calculated the percentage of the shifted RNA and normalized it. hsPURA III fragment shows much lower affinity therefore it does not fully shift RNA with the highest protein concentration when compared to the full-length PURA and to PURA I-II.

      (c) Do the human PURA I+II and dmPURA I+ II crystallize in the same space group and have similar packing? Can the observed structural flexibility be due to crystal contacts?

      hsPURA I+II and dmPURA I+II crystallize in different space groups with different crystal packing. In both cases, the asymmetric unit contains 4 independent molecules with the flexible part of the structure composed of the β4 and β8 (β ridge) exposed to solvent. In the case of the Drosophila structure, we do not observe any flexibility of both β-strands. In contrast, for the human PURA structure the β ridge exhibits lots of flexibility and it adopts different conformations in all 4 molecules of the asymmetric unit. We observe similar flexibility of the β4 and β8 (β ridge) in the structure of K97E mutant which contains 2 molecules in the asymmetric unit. We would like to add that we expect crystal contacts to rather stabilize than destabilize domains.

      Similarly, can the conformations observed for the K97E mutant be partially explained by packing?

      Regarding the sequence shift observed for the β5 and β6 strands in hsPURA I+II K97E variant: although the β5 strand with shifted amino acid sequence is involved in the contact with the symmetry-related molecule with another β5 strand we don’t consider this interaction as a source of the shift. To be sure that the shift is not forced by the crystallization, we had performed NMR measurement which confirmed that in solution there is a strong change in the β-stands comparing WT and K97E mutant. This is an unambiguous indication that the structural changes observed in the crystal structure are also happening in solution. In addition, the MD simulations provide additional confirmation of our interpretation that K97E destabilizes the corresponding PUR domain. Taken together, we provide proof from three different angles that the observed differences indeed affect the integrity and hence function of the protein.

      (d) Perhaps, it is my misunderstanding, but I find the NMR data on the Arg sidechains for the K97E confusing. If they are visible for K97E and not WT, doesn't this indicate that there is an exchange between two conformations or more dynamics in the WT structure? This does not seem to be the opposite of the expectation if K97E is thought to have more conformational flexibility.

      Due to a technical issue (peak contour level), arginine side chain resonances were not clearly visible in the WT spectrum. The figure 5F has been updated. Now, they do correspond to those seen in the mutant spectrum. However, to prevent any confusion or mis/overinterpretation, we removed the sentence regarding arginine side chain: "Intriguingly, arginine side chain resonances Nε-Hε were only visible in the K97E variant, while they were broadened out in the wild-type spectrum."

      (e) The most speculative part of the paper is the interpretation of SG and PB localization of PURA in Fig 1 and 2. There is an important issue with the statistics that must be clarified because it would appear that statistical significance was determined using each SG or PB as an independent measurement. This is incorrect and significance should be measured by only using the means of three biological replicates. This is well described here. It is not clear at this time if the reported P values will be confirmed upon reanalysis, and this may require reinterpretation of the data.

      We are grateful for this clarifying comment and agree that the statistical analysis of P-body and stress granule was misleading. Of note, while the figures depicted all the values independent of the biological repeats, the statistical analyses were done on the mean value of each replicate of each cell line and not all raw data points.

      We prepared new Plots, only showing the mean value of each replicate, and also re-calculated P-values. The values have changed only slightly in this new analysis because we now also included the previously labeled outliers (red points) to better demonstrate that significance still exists even when considering them.

      In the new analysis of stress-granule association, only the value of the K97E mutant lost its significance, indicating that its association to stress granules is not lost. Therefore, we adjusted the following sentences in the manuscript.

      Results:

      Original: "While quantification showed a reduced association of hsPURA K97E mutant with G3BP1-positive granules (Fig 1B), the two other mutants, I206F and F233del, showed the same co-localization to stress granules as the wild type control."

      Corrected: "In all the patient-related mutations, no significant reduction in stress granule association was seen when compared to the wild type control (Fig 1C)."

      Original: "The observation that only one of the patient-related mutations of hsPURA, K97E, showed reduced stress granule association indicates that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      Corrected: "As we did not observe significant changes in the association of patient-related mutations of hsPURA to stress granules, it is suggested that that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      (f) A western blot showing the level of overexpression of the PURA proteins should be shown in Figure 1 as well as the KD of endogenous PURA for Figure S2?

      As requested, a Western blot showing the level of overexpression of the different PURA proteins has been added as Appendix Figure S1.

      A Western blot of the siRNA-mediated knock-down experiments of PURA and their corresponding control has been added to Appendix Figure S3. Quantification of three biological repeats showed a significant reduction of PURA protein levels upon knock down.

      (g) While I appreciate that rewriting is time-consuming, I would recommend considering restructuring the manuscript because I think that it would aid the overall clarity. I think the foundation of the work is the structural characterization and would suggest beginning the paper with this data and the biochemical characterization. The co-localization with SGs and PBs and how this may be relevant to disease is much more speculative and is therefore better to present later. While I appreciate that the structural interpretation of why some mutants localize to PBs differently is not entirely clear, I do think that this would provide some context for the discussion.

      In the initial version of the manuscript we first presented the structural characterization of PURA and afterwards the co-localization with SGs and PBs. As this reviewer stated him-/herself in (e), we also noticed that the SG and PB interpretation is the most speculative part of this manuscript. We felt that having this at the end of the results section would weaken the manuscript. On the other hand, we consider that the structural interpretation of mutations is much stronger and has a greater impact for future research. After long discussion we decided to swap the order to leave the most important results for the end of the manuscript.

      Recommendations for the authors: Reviewer #3

      Concerns and Suggested Changes:

      (a) For the characterization of G3BP1-positive stress granules in HeLa cells upon depletion of PURA, it remains unclear what is the efficiency of siRNA? The authors should provide a western blot to indicate how much the endogenous levels were reduced.

      We completely agree with the stated concern and addressed it accordingly. We had performed this experiment prior to submission but for some unknown reason it was not included in the manuscript.

      The Western blot of siRNA-mediated knock-down experiments of PURA and their corresponding control is now shown in Appendix Figure S3. Quantification of three biological repeats, showed a significant reduction of PURA protein levels upon knock down.

      (b) How does knocking down PURA affect DCP1A-positive structures in HeLa cells? Would P bodies be formed even in the absence (or reduction) of total PURA?

      Indeed, the stated question is very interesting. In fact, we have already shown in our recent publication (Molitor et al., 2023) that a knock down of PURA in HeLa and NHDF cells leads to a significant reduction of P-bodies. We actually referred to this finding on page 6:

      "Since hsPURA was recently shown to be required for P-body formation in HeLa cells and fibroblasts (Molitor et al. 2023), PURA-dependent liquid phase separation could potentially also directly contribute to the formation of these granules."

      On the same page, we also refer to the underlying molecular mechanism:

      "However, when putting this observation in perspective with previous reports, it seems unlikely that P-body formation directly depends on phase separation by hsPURA, but rather on its recently reported function as gene regulator of the essential P-body core factors LSM14a and DDX6 (Molitor et al., 2023)."

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      These ingenious and thoughtful studies present important findings concerning how people represent and generalise abstract patterns of sensory data. The issue of generalisation is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception, learning, and cognitive science. The findings have the potential to provide compelling support for the outlined account, but there appear other possible explanations, too, that may affect the scope of the findings but could be considered in a revision.

      Thank you for sending the feedback from the three peer reviewers regarding our paper. Please find below our detailed responses addressing the reviewers' comments. We have incorporated these suggestions into the paper and provided explanations for the modifications made.

      We have specifically addressed the point of uncertainty highlighted in eLife's editorial assessment, which concerned alternative explanations for the reported effect. In response to Reviewer #1, we have clarified how Exp. 2c and Exp. 3c address the potential alternative explanation related to "attention to dimensions." Further, we present a supplementary analysis to account for differences in asymptotic learning, as noted by Reviewer #2. We have also clarified how our control experiments address effects associated with general cognitive engagement in the task. Lastly, we have further clarified the conceptual foundation of our paper, addressing concerns raised by Reviewers #2 and #3.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports a series of experiments examining category learning and subsequent generalization of stimulus representations across spatial and nonspatial domains. In Experiment 1, participants were first trained to make category judgments about sequences of stimuli presented either in nonspatial auditory or visual modalities (with feature values drawn from a two-dimensional feature manifold, e.g., pitch vs timbre), or in a spatial modality (with feature values defined by positions in physical space, e.g., Cartesian x and y coordinates). A subsequent test phase assessed category judgments for 'rotated' exemplars of these stimuli: i.e., versions in which the transition vectors are rotated in the same feature space used during training (near transfer) or in a different feature space belonging to the same domain (far transfer). Findings demonstrate clearly that representations developed for the spatial domain allow for representational generalization, whereas this pattern is not observed for the nonspatial domains that are tested. Subsequent experiments demonstrate that if participants are first pre-trained to map nonspatial auditory/visual features to spatial locations, then rotational generalization is facilitated even for these nonspatial domains. It is argued that these findings are consistent with the idea that spatial representations form a generalized substrate for cognition: that space can act as a scaffold for learning abstract nonspatial concepts.

      Strengths:

      I enjoyed reading this manuscript, which is extremely well-written and well-presented. The writing is clear and concise throughout, and the figures do a great job of highlighting the key concepts. The issue of generalization is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception and cognitive science. It's also excellent to see that the hypotheses, methods, and analyses were pre-registered.

      The experiments that have been run are ingenious and thoughtful; I particularly liked the use of stimulus structures that allow for disentangling of one-dimensional and two-dimensional response patterns. The studies are also well-powered for detecting the effects of interest. The model-based statistical analyses are thorough and appropriate throughout (and it's good to see model recovery analysis too). The findings themselves are clear-cut: I have little doubt about the robustness and replicability of these data.

      Weaknesses:

      I have only one significant concern regarding this manuscript, which relates to the interpretation of the findings. The findings are taken to suggest that "space may serve as a 'scaffold', allowing people to visualize and manipulate nonspatial concepts" (p13). However, I think the data may be amenable to an alternative possibility. I wonder if it's possible that, for the visual and auditory stimuli, participants naturally tended to attend to one feature dimension and ignore the other - i.e., there may have been a (potentially idiosyncratic) difference in salience between the feature dimensions that led to participants learning the feature sequence in a one-dimensional way (akin to the 'overshadowing' effect in associative learning: e.g., see Mackintosh, 1976, "Overshadowing and stimulus intensity", Animal Learning and Behaviour). By contrast, we are very used to thinking about space as a multidimensional domain, in particular with regard to two-dimensional vertical and horizontal displacements. As a result, one would naturally expect to see more evidence of two-dimensional representation (allowing for rotational generalization) for spatial than nonspatial domains.

      In this view, the impact of spatial pre-training and (particularly) mapping is simply to highlight to participants that the auditory/visual stimuli comprise two separable (and independent) dimensions. Once they understand this, during subsequent training, they can learn about sequences on both dimensions, which will allow for a 2D representation and hence rotational generalization - as observed in Experiments 2 and 3. This account also anticipates that mapping alone (as in Experiment 4) could be sufficient to promote a 2D strategy for auditory and visual domains.

      This "attention to dimensions" account has some similarities to the "spatial scaffolding" idea put forward in the article, in arguing that experience of how auditory/visual feature manifolds can be translated into a spatial representation helps people to see those domains in a way that allows for rotational generalization. Where it differs is that it does not propose that space provides a scaffold for the development of the nonspatial representations, i.e., that people represent/learn the nonspatial information in a spatial format, and this is what allows them to manipulate nonspatial concepts. Instead, the "attention to dimensions" account anticipates that ANY manipulation that highlights to participants the separable-dimension nature of auditory/visual stimuli could facilitate 2D representation and hence rotational generalization. For example, explicit instruction on how the stimuli are constructed may be sufficient, or pre-training of some form with each dimension separately, before they are combined to form the 2D stimuli.

      I'd be interested to hear the authors' thoughts on this account - whether they see it as an alternative to their own interpretation, and whether it can be ruled out on the basis of their existing data.

      We thank the Reviewer for their comments. We agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are incompatible with this alternative explanation.

      In Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is thus necessary to pay attention to both auditory dimensions and both visual dimensions to perform the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, L&S investigates the important general question of how humans achieve invariant behavior over stimuli belonging to one category given the widely varying input representation of those stimuli and more specifically, how they do that in arbitrary abstract domains. The authors start with the hypothesis that this is achieved by invariance transformations that observers use for interpreting different entries and furthermore, that these transformations in an arbitrary domain emerge with the help of the transformations (e.g. translation, rotation) within the spatial domain by using those as "scaffolding" during transformation learning. To provide the missing evidence for this hypothesis, L&S used behavioral category learning studies within and across the spatial, auditory, and visual domains, where rotated and translated 4-element token sequences had to be learned to categorize and then the learned transformation had to be applied in new feature dimensions within the given domain. Through single- and multiple-day supervised training and unsupervised tests, L&S demonstrated by standard computational analyses that in such setups, space and spatial transformations can, indeed, help with developing and using appropriate rotational mapping whereas the visual domain cannot fulfill such a scaffolding role.

      Strengths:

      The overall problem definition and the context of spatial mapping-driven solution to the problem is timely. The general design of testing the scaffolding effect across different domains is more advanced than any previous attempts clarifying the relevance of spatial coding to any other type of representational codes. Once the formulation of the general problem in a specific scientific framework is done, the following steps are clearly and logically defined and executed. The obtained results are well interpretable, and they could serve as a good stepping stone for deeper investigations. The analytical tools used for the interpretations are adequate. The paper is relatively clearly written.

      Weaknesses:

      Some additional effort to clarify the exact contribution of the paper, the link between analyses and the claims of the paper, and its link to previous proposals would be necessary to better assess the significance of the results and the true nature of the proposed mechanism of abstract generalization.

      (1) Insufficient conceptual setup: The original theoretical proposal (the Tolman-Eichenbaum-Machine, Whittington et al., Cell 2020) that L&S relate their work to proposes that just as in the case of memory for spatial navigation, humans and animals create their flexible relational memory system of any abstract representation by a conjunction code that combines on the one hand, sensory representation and on the other hand, a general structural representation or relational transformation. The TEM also suggests that the structural representation could contain any graph-interpretable spatial relations, albeit in their demonstration 2D neighbor relations were used. The goal of L&S's paper is to provide behavioral evidence for this suggestion by showing that humans use representational codes that are invariant to relational transformations of non-spatial abstract stimuli and moreover, that humans obtain these invariances by developing invariance transformers with the help of available spatial transformers. To obtain such evidence, L&S use the rotational transformation. However, the actual procedure they use actually solved an alternative task: instead of interrogating how humans develop generalizations in abstract spaces, they demonstrated that if one defines rotation in an abstract feature space embedded in a visual or auditory modality that is similar to the 2D space (i.e. has two independent dimensions that are clearly segregable and continuous), humans cannot learn to apply rotation of 4-piece temporal sequences in those spaces while they can do it in 2D space, and with co-associating a one-to-one mapping between locations in those feature spaces with locations in the 2D space an appropriate shaping mapping training will lead to the successful application of rotation in the given task (and in some other feature spaces in the given domain). While this is an interesting and challenging demonstration, it does not shed light on how humans learn and generalize, only that humans CAN do learning and generalization in this, highly constrained scenario. This result is a demonstration of how a stepwise learning regiment can make use of one structure for mapping a complex input into a desired output. The results neither clarify how generalizations would develop in abstract spaces nor the question of whether this generalization uses transformations developed in the abstract space. The specific training procedure ensures success in the presented experiments but the availability and feasibility of an equivalent procedure in a natural setting is a crucial part of validating the original claim and that has not been done in the paper.

      We thank the Reviewer for their detailed comments on our manuscript. We reply to the three main points in turn.

      First, concerning the conceptual grounding of our work, we would point out that the TEM model (Whittington et al., 2020), however interesting, is not our theoretical starting point. Rather, as we hope the text and references make clear, we ground our work in theoretical work from the 1990/2000s proposing that space acts as a scaffold for navigating abstract spaces (such as Gärdenfors, 2000). We acknowledge that the TEM model and other experimental work on the implication of the hippocampus, the entorhinal cortex and the parietal cortex in relational transformations of nonspatial stimuli provide evidence for this general theory. However, our work is designed to test a more basic question: whether there is behavioural evidence that space scaffolds learning in the first place. To achieve this, we perform behavioural experiments with causal manipulation (spatial pre-training vs no spatial pre-training) have the potential to provide such direct evidence. This is why we claim that:

      “This theory is backed up by proof-of-concept computational simulations [13], and by findings that brain regions thought to be critical for spatial cognition in mammals (such as the hippocampal-entorhinal complex and parietal cortex) exhibit neural codes that are invariant to relational transformations of nonspatial stimuli. However, whilst promising, this theory lacks direct empirical evidence. Here, we set out to provide a strong test of the idea that learning about physical space scaffolds conceptual generalisation.“

      Second, we agree with the Reviewer that we do not provide an explicit model for how generalisation occurs, and how precisely space acts as a scaffold for building representations and/or applying the relevant transformations to non-spatial stimuli to solve our task. Rather, we investigate in our Exp. 2-4 which aspects of the training are necessary for rotational generalisation to happen (and conclude that a simple training with the multimodal association task is sufficient for ~20% participants). We now acknowledge in the discussion the fact that we do not provide an explicit model and leave that for future work:

      “We acknowledge that our study does not provide a mechanistic model of spatial scaffolding but rather delineate which aspects of the training are necessary for generalisation to happen.”

      Finally, we also agree with the Reviewer that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      (2) Missing controls: The asymptotic performance in experiment 1 after training in the three tasks was quite different in the three tasks (intercepts 2.9, 1.9, 1.6 for spatial, visual, and auditory, respectively; p. 5. para. 1, Fig 2BFJ). It seems that the statement "However, our main question was how participants would generalise learning to novel, rotated exemplars of the same concept." assumes that learning and generalization are independent. Wouldn't it be possible, though, that the level of generalization depends on the level of acquiring a good representation of the "concept" and after obtaining an adequate level of this knowledge, generalization would kick in without scaffolding? If so, a missing control is to equate the levels of asymptotic learning and see whether there is a significant difference in generalization. A related issue is that we have no information on what kind of learning in the three different domains was performed, albeit we probably suspect that in space the 2D representation was dominant while in the auditory and visual domains not so much. Thus, a second missing piece of evidence is the model-fitting results of the ⦰ condition that would show which way the original sequences were encoded (similar to Fig 2 CGK and DHL). If the reason for lower performance is not individual stimulus difficulty but the natural tendency to encode the given stimulus type by a combo of random + 1D strategy that would clarify that the result of the cross-training is, indeed, transferring the 2D-mapping strategy.

      We agree with the Reviewer that a good further control is to equate performance during training. Thus, we have run a complementary analysis where we select only the participants that reach > 90% accuracy in the last block of training in order to equate asymptotic performance after training in Exp. 1. The results (see Author response image 1) replicates the results that we report in the main text: there is a large difference between groups (relative likelihood of 1D vs. 2D models, all BF > 100 in favour of a difference between the auditory and the spatial modalities, between the visual and the spatial modalities, in both near and far transfer, “decisive” evidence). We prefer not to include this figure in the paper for clarity, and because we believe this result is expected given the fact that 0/50 and 0/50 of the participants in the auditory and visual condition used a 2D strategy – thus, selecting subgroups of these participants cannot change our conclusions.

      Author response image 1.

      Results of Exp. 1 when selecting participants that reached > 90% accuracy in the last block of training. Captions are the same as Figure 2 of the main text.

      Second, the Reviewer suggested that we run the model fitting analysis only on the ⦰ condition (training) in Exp. 1 to reveal whether participants use a 1D or a 2D strategy already during training. Unfortunately, we cannot provide the model fits only in the ⦰ condition in Exp. 1 because all models make the same predictions for this condition (see Fig S4). However, note that this is done by design: participants were free to apply whatever strategy they want during training; we then used the generalisation phase with the rotated stimuli precisely to reveal this strategy. Further, we do believe that the strategy used by the participants during training and the strategy during transfer are the same, partly because – starting from block #4 – participants have no idea whether the current trial is a training trial or a transfer trial, as both trial types are randomly interleaved with no cue signalling the trial type. We have made this clear in the methods:

      “They subsequently performed 105 trials (with trialwise feedback) and 105 transfer trials including rotated and far transfer quadruplets (without trialwise feedback) which were presented in mixed blocks of 30 trials. Training and transfer trials were randomly interleaved, and no clue indicated whether participants were currently on a training trial or a transfer trial before feedback (or absence of feedback in case of a transfer trial).”

      Reviewer #3 (Public Review):

      Summary:

      Pesnot Lerousseau and Summerfield aimed to explore how humans generalize abstract patterns of sensory data (concepts), focusing on whether and how spatial representations may facilitate the generalization of abstract concepts (rotational invariance). Specifically, the authors investigated whether people can recognize rotated sequences of stimuli in both spatial and nonspatial domains and whether spatial pre-training and multi-modal mapping aid in this process.

      Strengths:

      The study innovatively examines a relatively underexplored but interesting area of cognitive science, the potential role of spatial scaffolding in generalizing sequences. The experimental design is clever and covers different modalities (auditory, visual, spatial), utilizing a two-dimensional feature manifold. The findings are backed by strong empirical data, good data analysis, and excellent transparency (including preregistration) adding weight to the proposition that spatial cognition can aid abstract concept generalization.

      Weaknesses:

      The examples used to motivate the study (such as "tree" = oak tree, family tree, taxonomic tree) may not effectively represent the phenomena being studied, possibly confusing linguistic labels with abstract concepts. This potential confusion may also extend to doubts about the real-life applicability of the generalizations observed in the study and raises questions about the nature of the underlying mechanism being proposed.

      We thank the Reviewer for their comments. We agree that we could have explained ore clearly enough how these examples motivate our study. The similarity between “oak tree” and “family tree” is not just the verbal label. Rather, it is the arrangement of the parts (nodes and branches) in a nested hierarchy. Oak trees and family trees share the same relational structure. The reason that invariance is relevant here is that the similarity in relational structure is retained under rigid body transformations such as rotation or translation. For example, an upside-down tree can still be recognised as a tree, just as a family tree can be plotted with the oldest ancestors at either top or bottom. Similarly, in our study, the quadruplets are defined by the relations between stimuli: all quadruplets use the same basic stimuli, but the categories are defined by the relations between successive stimuli. In our task, generalising means recognising that relations between stimuli are the same despite changes in the surface properties (for example in far transfer). We have clarify that in the introduction:

      “For example, the concept of a “tree” implies an entity whose structure is defined by a nested hierarchy, whether this is a physical object whose parts are arranged in space (such as an oak tree in a forest) or a more abstract data structure (such as a family tree or taxonomic tree). [...] Despite great changes in the surface properties of oak trees, family trees and taxonomic trees, humans perceive them as different instances of a more abstract concept defined by the same relational structure.”

      Next, the study does not explore whether scaffolding effects could be observed with other well-learned domains, leaving open the question of whether spatial representations are uniquely effective or simply one instance of a familiar 2D space, again questioning the underlying mechanism.

      We would like to mention that Reviewer #2 had a similar comment. We agree with both Reviewers that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      Further doubt on the underlying mechanism is cast by the possibility that the observed correlation between mapping task performance and the adoption of a 2D strategy may reflect general cognitive engagement rather than the spatial nature of the task. Similarly, the surprising finding that a significant number of participants benefited from spatial scaffolding without seeing spatial modalities may further raise questions about the interpretation of the scaffolding effect, pointing towards potential alternative interpretations, such as shifts in attention during learning induced by pre-training without changing underlying abstract conceptual representations.

      The Reviewer is concerned about the fact that the spatial pre-training could benefit the participants by increasing global cognitive engagement rather than providing a scaffold for learning invariances. It is correct that the participants in the control group in Exp. 2c have poorer performances on average than participants that benefit from the spatial pre-training in Exp. 2a and 2b. The better performances of the participants in Exp. 2a and 2b could be due to either the spatial nature of the pre-training (as we claim) or a difference in general cognitive engagement. .

      However, if we look closely at the results of Exp. 3, we can see that the general cognitive engagement hypothesis is not well supported by the data. Indeed, the participants in the control condition (Exp. 3c) have relatively similar performances than the other groups during training. Rather, the difference is in the strategy they use, as revealed by the transfer condition. The majority of them are using a 1D strategy, contrary to the participants that benefited from a spatial pre-training (Exp 3a and 3b). We have included a sentence in the results:

      “Further, the results show that participants who did not experience spatial pre-training were still engaged in the task, but were not using the same strategy as the participants who experienced spatial pre-training (1D rather than 2D). Thus, the benefit of the spatial pre-training is not simply to increase the cognitive engagement of the participants. Rather, spatial pre-training provides a scaffold to learn rotation-invariant representation of auditory and visual concepts even when rotation is never explicitly shown during pre-training.”

      Finally, Reviewer #1 had a related concern about a potential alternative explanation that involved a shift in attention. We reproduce our response here: we agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting (and potentially concerning) alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are not compatible with this alternative explanation.

      Indeed, in Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is necessary to pay attention to both auditory dimensions and both visual dimensions to perform well in the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants actually paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Conclusions:

      The authors successfully demonstrate that spatial training can enhance the ability to generalize in nonspatial domains, particularly in recognizing rotated sequences. The results for the most part support their conclusions, showing that spatial representations can act as a scaffold for learning more abstract conceptual invariances. However, the study leaves room for further investigation into whether the observed effects are unique to spatial cognition or could be replicated with other forms of well-established knowledge, as well as further clarifications of the underlying mechanisms.

      Impact:

      The study's findings are likely to have a valuable impact on cognitive science, particularly in understanding how abstract concepts are learned and generalized. The methods and data can be useful for further research, especially in exploring the relationship between spatial cognition and abstract conceptualization. The insights could also be valuable for AI research, particularly in improving models that involve abstract pattern recognition and conceptual generalization.

      In summary, the paper contributes valuable insights into the role of spatial cognition in learning abstract concepts, though it invites further research to explore the boundaries and specifics of this scaffolding effect.

      Reviewer #1 (Recommendations For The Authors):

      Minor issues / typos:

      P6: I think the example of the "signed" mapping here should be "e.g., ABAB maps to one category and BABA maps to another", rather than "ABBA maps to another" (since ABBA would always map to another category, whether the mapping is signed or unsigned).

      Done.

      P11: "Next, we asked whether pre-training and mapping were systematically associated with 2Dness...". I'd recommend changing to: "Next, we asked whether accuracy during pre-training and mapping were systematically associated with 2Dness...", just to clarify what the analyzed variables are.

      Done.

      P13, paragraph 1: "only if the features were themselves are physical spatial locations" either "were" or "are" should be removed.

      Done.

      P13, paragraph 1: should be "neural representations of space form a critical substrate" (not "for").

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors use in multiple places in the manuscript the phrases "learn invariances" (Abstract), "formation of invariances" (p. 2, para. 1), etc. It might be just me, but this feels a bit like 'sloppy' wording: we do not learn or form invariances, rather we learn or form representations or transformations by which we can perform tasks that require invariance over particular features or transformation of the input such as the case of object recognition and size- translation- or lighting-invariance. We do not form size invariance, we have representations of objects and/or size transformations allowing the recognition of objects of different sizes. The authors might change this way of referring to the phenomenon.

      We respectfully disagree with this comment. An invariance occurs when neurons make the same response under different stimulation patterns. The objects or features to which a neuron responds is shaped by its inputs. Those inputs are in turn determined by experience-dependent plasticity. This process is often called “representation learning”. We think that our language here is consistent with this status quo view in the field.

      Reviewer #3 (Recommendations For The Authors):

      • I understand that the objective of the present experiment is to study our ability to generalize abstract patterns of sensory data (concepts). In the introduction, the authors present examples like the concept of a "tree" (encompassing a family tree, an oak tree, and a taxonomic tree) and "ring" to illustrate the idea. However, I am sceptical as to whether these examples effectively represent the phenomena being studied. From my perspective, these different instances of "tree" do not seem to relate to the same abstract concept that is translated or rotated but rather appear to share only a linguistic label. For instance, the conceptual substance of a family tree is markedly different from that of an oak tree, lacking significant overlap in meaning or structure. Thus, to me, these examples do not demonstrate invariance to transformations such as rotations.

      To elaborate further, typically, generalization involves recognizing the same object or concept through transformations. In the case of abstract concepts, this would imply a shared abstract representation rather than a mere linguistic category. While I understand the objective of the experiments and acknowledge their potential significance, I find myself wondering about the real-world applicability and relevance of such generalizations in everyday cognitive functioning. This, in turn, casts some doubt on the broader relevance of the study's results. A more fitting example, or an explanation that addresses my concerns about the suitability of the current examples, would be beneficial to further clarify the study's intent and scope.

      Response in the public review.

      • Relatedly, the manuscript could benefit from greater clarity in defining key concepts and elucidating the proposed mechanism behind the observed effects. Is it plausible that the changes observed are primarily due to shifts in attention induced by the spatial pre-training, rather than a change in the process of learning abstract conceptual invariances (i.e., modifications to the abstract representations themselves)? While the authors conclude that spatial pre-training acts as a scaffold for enhancing the learning of conceptual invariances, it raises the question: does this imply participants simply became more focused on spatial relationships during learning, or might this shift in attention represent a distinct strategy, and an alternative explanation? A more precise definition of these concepts and a clearer explanation of the authors' perspective on the mechanism underlying these effects would reduce any ambiguity in this regard.

      Response in the public review.

      • I am wondering whether the effectiveness of spatial representations in generalizing abstract concepts stems from their special nature or simply because they are a familiar 2D space for participants. It is well-established that memory benefits from linking items to familiar locations, a technique used in memory training (method of loci). This raises the question: Are we observing a similar effect here, where spatial dimensions are the only tested familiar 2D spaces, while the other 2 spaces are simply unfamiliar, as also suggested by the lower performance during training (Fig.2)? Would the results be replicable with another well-learned, robustly encoded domain, such as auditory dimensions for professional musicians, or is there something inherently unique about spatial representations that aids in bootstrapping abstract representations?

      On the other side of the same coin, are spatial representations qualitatively different, or simply more efficient because they are learned more quickly and readily? This leads to the consideration that if visual pre-training and visual-to-auditory mapping were continued until a similar proficiency level as in spatial training is achieved, we might observe comparable performance in aiding generalization. Thus, the conclusion that spatial representations are a special scaffold for abstract concepts may not be exclusively due to their inherent spatial nature, but rather to the general characteristic of well-established representations. This hypothesis could be further explored by either identifying alternative 2D representations that are equally well-learned or by extending training in visual or auditory representations before proceeding with the mapping task. At the very least I believe this potential explanation should be explored in the discussion section.

      Response in the public review.

      I had some difficulty in following an important section of the introduction: "... whether participants can learn rotationally invariant concepts in nonspatial domains, i.e., those that are defined by sequences of visual and auditory features (rather than by locations in physical space, defined in Cartesian or polar coordinates) is not known." This was initially puzzling to me as the paragraph preceding it mentions: "There is already good evidence that nonspatial concepts are represented in a translation invariant format." While I now understand that the essential distinction here is between translation and rotation, this was not immediately apparent upon first reading. This crucial distinction, especially in the context of conceptual spaces, was not clearly established before this point in the manuscript. For better clarity, it would be beneficial to explicitly contrast and define translation versus rotation in this particular section and stress that the present study concerns rotations in abstract spaces.

      Done.

      • The multi-modal association is crucial for the study, however to my knowledge, it is not depicted or well explained in the main text or figures (Results section). In my opinion, the details of this task should be explained and illustrated before the details of the associated results are discussed.

      We have included an illustration of a multimodal association trial in Fig. S3B.

      Author response image 2.

      • The observed correlation between the mapping task performance and the adoption of a 2D strategy is logical. However, this correlation might not exclusively indicate the proposed underlying mechanism of spatial scaffolding. Could it also be reflective of more general factors like overall performance, attention levels, or the effort exerted by participants? This alternative explanation suggests that the correlation might arise from broader cognitive engagement rather than specifically from the spatial nature of the task. Addressing this possibility could strengthen the argument for the unique role of spatial representations in learning abstract concepts, or at least this alternative interpretation should be mentioned.

      Response in the public review.

      • To me, the finding that ~30% of participants benefited from the spatial scaffolding effect for example in the auditory condition merely through exposure to the mapping (Fig 4D), without needing to see the quadruplets in the spatial modality, was somewhat surprising. This is particularly noteworthy considering that only ~60% of participants adopted the 2D strategy with exposure to rotated contingencies in Experiment 3 (Fig 3D). How do the authors interpret this outcome? It would be interesting to understand their perspective on why such a significant effect emerged from mere exposure to the mapping task.

      • I appreciate the clarity Fig.1 provides in explaining a challenging experimental setup. Is it possible to provide example trials, including an illustration that shows which rotations produce the trail and an intuitive explanation that response maps onto the 1D vs 2D strategies respectively, to aid the reader in better understanding this core manipulation?

      • I like that the authors provide transparency by depicting individual subject's data points in their results figures (e.g. Figs. 2 B, F, J). However, with an n=~50 per condition, it becomes difficult to intuit the distribution, especially for conditions with higher variance (e.g., Auditory). The figures might be more easily interpretable with alternative methods of displaying variances, such as violin plots per data point, conventional error shading using 95%CIs, etc.

      • Why are the authors not reporting exact BFs in the results sections at least for the most important contrasts?

      • While I understand why the authors report the frequencies for the best model fits, this may become difficult to interpret in some sections, given the large number of reported values. Alternatives or additional summary statistics supporting inference could be beneficial.

      As the Reviewer states, there are a large number of figures that we can report in this study. We have chosen to keep this number at a minimum to be as clear as possible. To illustrate the distribution of individual data points, we have opted to display only the group's mean and standard error (the standard errors are included, but the substantial number of participants per condition provides precise estimates, resulting in error bars that can be smaller than the mean point). This decision stems from our concern that including additional details could lead to a cluttered representation with unnecessary complexity. Finally, we report what we believe to be the critical BFs for the comprehension of the reader in the main text, and choose a cutoff of 100 when BFs are high (corresponding to the label “decisive” evidence, some BFs are larger than 1012). All the exact BFs are in the supplementary for the interested readers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides a near-complete description of the mechanosensory bristles on the Drosophila melanogaster head and the anatomy and projection patterns of the bristle mechanosensory neurons that innervate them. The data presented are solid. The study has generated numerous invaluable resources for the community that will be of interest to neuroscientists in the field of circuits and behaviour, particularly those interested in mechanosensation and behavioural sequence generation.

      We express our gratitude to the Reviewers for their valuable suggestions, which significantly enhanced the manuscript. The revisions were undertaken, not with the expectation of acceptance, but rather driven by our sincere belief that these revisions would enhance the manuscript's impact for future readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Sensory neurons of the mechanosensory bristles on the head of the fly project to the sub esophageal ganglion (SEZ). In this manuscript, the authors have built on a large body of previous work to comprehensively classify and quantify the head bristles. They broadly identify the nerves that various bristles use to project to the SEZ and describe their region-specific innervation in the SEZ. They use dye-fills, clonal labelling, and electron microscopic reconstructions to describe in detail the phenomenon of somatotopy - conserved peripheral representations within the central brain - within the innervation of these neurons. In the process they develop novel tools to access subsets of these neurons. They use these to demostrate that groups of bristles in different parts of the head control different aspects of the grooming sequence.

      Reviewer #2 (Public Review):

      The authors combine genetic tools, dye fills and connectome analysis techniques to generate a "first-of-its-kind", near complete, synaptic resolution map of the head bristle neurons of Drosophila. While some of the BMN anatomy was already known based on previous work by the authors and other researchers, this is the first time a near complete map has been created for the head BMNs at electron microscopy resolution.

      Strengths:

      (1) The authors cleverly use techniques that allow moving back and forth between periphery (head bristle location) and brain, as well as moving between light microscopy and electron microscopy data. This allows them to first characterize the pathways taken by different head BMNs to project to the brain and also characterize anatomical differences among individual neurons at the level of morphology and connectivity.

      (2) The work is very comprehensive and results in a near complete map of all I’m head BMNs.

      (3) Authors also complement this anatomical characterization with a first-level functional analysis using optogenetic activation of BMNs that results in expected directed grooming behavior.

      Weaknesses:

      (1) The clustering analysis is compelling but cluster numbers seem to be arbitrarily chosen instead of by using some informed metrics.

      We made revisions to the manuscript that address this concern. Please see our response to “recommendations for authors” for a description of these revisions.

      (2) It could help provide context if authors revealed some of the important downstream pathways that could explain optogenetics behavioral phenotypes and previously shown hierarchical organization of grooming sequences.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      (3) In contrast to the rigorous quantitative analysis of the anatomical data, the behavioral data is analyzed using much more subjective methods. While I do not think it is necessary to perform a rigorous analysis of behaviors in this anatomy focused manuscript, the conclusions based on behavioral analysis should be treated as speculative in the current form e.g. calling "nodding + backward walking" as an avoidance response is not justified as it currently stands. Strong optogenetic activation could lead to sudden postural changes that due to purely biomechanical constraints could lead to a couple of backward steps as seen in the example videos. Moreover since the quantification is manual, it is not clear what the analyst interprets as backward walking or nodding. Interpretation is also concerning because controls show backward walking (although in fewer instances based on subjective quantification).

      While unbiased machine vision-based methods would nicely complement the present work, this type of analysis is not yet working to distinguish between different head grooming movements. Therefore, we are currently limited to manual annotation for our behavioral analysis. That said, we do not believe that our manual annotation is subjective. The grooming movements that we examine in this work are distinguishable from each other through frame-by-frame manual annotation of video at 30 fps. Our annotation of the grooming and backward motions performed by flies are based on previous publications that established a controlled vocabulary defining each movement (Hampel et al., 2020a, 2017, 2015; Seeds et al., 2014). In this work, we added head nodding to this controlled vocabulary that is described in the Materials and methods. We have added additional text to the third paragraph of the Material and methods section entitled “Behavioral analysis procedures” that we hope better describes our behavioral analysis. This description now reads:

      Head nodding was annotated when the fly tilted its head downward by any amount until it returned its head back in its original position. This movement often occurred in repeated cycles. Therefore, the “start” was scored at the onset of the first forward movement and the “stop” when the head returned to its original position on the last nod.

      We do not make any firm conclusions about the head movements (nodding) and backwards motions. We refer to nodding as a descriptive term that would allow the reader to better understand what the behavior looks like. We make no firm conclusions about any behavioral functional role that either the nodding or the backward motions might have, with the exception of nodding in the context of grooming. We only suggest that the behaviors appear to be avoidance responses. Furthermore, backward walking was not mentioned. Instead we refer to backward motions. We are only reporting our annotations of these movements that do occur, and are significantly different from controls. We speculate that these could be avoidance responses based on support from the literature. Future studies will be required to understand whether these movements serve real behavioral roles.

      Summary:

      The authors end up generating a near-complete map of head BMNs that will serve as a long-standing resource to the Drosophila research community. This will directly shape future experiments aimed at modeling or functionally analyzing the head grooming circuit to understand how somatotopy guides behaviors.

      Reviewer #3 (Public Review):

      Eichler et al. set out to map the locations of the mechanosensory bristles on the fly head, examine the axonal morphology of the bristle mechanosensory neurons (BMNs) that innervate them, and match these to electron microscopy reconstructions of the same BMNs in a previously published EM volume of the female adult fly brain. They used BMN synaptic connectivity information to create clusters of BMNs that they show occupy different regions of the subesophageal zone brain region and use optogenetic activation of subsets of BMNs to support the claim that the morphological projections and connectivity of defined groups of BMNs are consistent with the parallel model for behavioral sequence generation.

      The authors have beautifully cataloged the mechanosensory bristles and the projection paths and patterns of the corresponding BMN axons in the brain using detailed and painstaking methods. The result is a neuroanatomy resource that will be an important community resource. To match BMNs reconstructed in an electron microscopy volume of the adult fly brain, the authors matched clustered reconstructed BMNs with light-level BMN classes using a variety of methods, but evidence for matching is only summarized and not demonstrated in a way that allows the reader to evaluate the strength of the evidence. The authors then switch from morphology-based categorization to non-BMN connectivity as a clustering method, which they claim demonstrates that BMNs form a somatotopic map in the brain. This map is not easily appreciated, and although contralateral projections in some populations are clear, the distinct projection zones that are mentioned by the authors are not readily apparent. Because of the extensive morphological overlap between connectivity-based clusters, it is not clear that small projection differences at the projection level are what determines the post-synaptic connectivity of a given BMN cluster or their functional role during behavior. The claim the somatotopic organization of BMN projections is preserved among their postsynaptic partners to form parallel sensory pathways is not supported by the result that different connectivity clusters still have high cosine similarity in a number of cases (i.e. Clusters 1 and 3, or Clusters 1 and 2). Finally, the authors use tools that were generated during the light-level characterization of BMN projections to show that specifically activating BMNs that innervate different areas of the head triggers different grooming behaviors. In one case, activation of a single population of sensory bristles (lnOm) triggers two different behaviors, both eye and dorsal head grooming. This result does not seem consistent with the parallel model, which suggests that these behaviors should be mutually exclusive and rely on parallel downstream circuitry.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      This work will have a positive impact on the field by contributing a complete accounting of the mechanosensory bristles of the fruit fly head, describing the brain projection patterns of the BMNs that innervate them, and linking them to BMN sensory projections in an electron microscopy volume of the adult fly brain. It will also have a positive impact on the field by providing genetic tools to help functionally subdivide the contributions of different BMN populations to circuit computations and behavior. This contribution will pave the way for further mechanistic study of central circuits that subserve grooming circuits.

      Recommendations for the authors:

      All three reviewers appreciated the work presented in this manuscript. There were also a few overlapping concerns that were raised that are summarised below, should the authors wish to address them:

      Somatotopy: We recommend that the authors describe the extent of prior knowledge in more detail to highlight their contribution better.

      We made revisions that better highlight the extent of prior knowledge about somatotopy. We describe how previous studies showed bristle mechanosensory neurons in insects are somatotopically organized, but these studies were not comprehensive descriptions of complete somatotopic maps for the head or body. To our knowledge, our study provides the first comprehensive and synaptic resolution somatotopic map of a head for any animal. This sets the stage for the complete definition of the interface between somatotopically-organized mechanosensory neurons and postsynaptic circuits, which has broad implications for future studies on aimed grooming, and mechanosensation in general. Below we itemize revisions to the Introduction, Discussion, and Figures to provide a clearer statement of the significance of our study as it relates to somatotopy.

      (1) Newly added Figure 1 – figure supplement 1 more explicitly grounds the study in somatotopy, providing a working model of the organization of the circuit pathways that produce the grooming sequence. This model features somatotopy as shown in Figure 1 – figure supplement 1C.

      (2) Figure 1 – figure supplement 1 is incorporated into the Introduction in the second, third, and fourth paragraphs, the first paragraph of the Results section titled “Somatotopically-organized parallel BMN pathways”, and the second and third paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      (3) We added text to the end of the fourth paragraph of the Introduction that now reads: “In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.”

      (4) There is a Discussion section that further explains the extent of prior knowledge and our contributions on somatotopy that is titled “A synaptic resolution somatotopic map of the head BMNs”. Additionally, the previous version of this section had a paragraph on the broader implications of our work as it relates to somatotopy across species. In light of the reviewer comments, we decided to make this paragraph into its own Discussion section to better highlight the broader significance of our work. This section is titled “First synaptic resolution somatotopic map of the head”.

      The somatotopy isn't overtly obvious - perhaps they could try mapping presynaptic sites and provide landmarks to improve visualisation.

      We made the following revisions to better highlight the head BMN somatotopy. One point of confusion from the previous manuscript version stemmed from us not explicitly defining the somatotopic organization that we observed. There seemed to be confusion that we were defining the head somatotopy based only on the small projection differences among BMNs from neighboring head locations. While we believe that these small differences indeed correspond to somatotopy, we failed to highlight that there are overt differences in the brain projections of BMNs from distant locations on the head. For example, Figure 5B (right panel) shows the distinct projections between the LabNv (brown) and AntNv (blue) BMNs that innervate bristles on the ventral and dorsal head, respectively. Thus, BMN types innervating neighboring bristles show overlapping projections with small projection differences, whereas those innervating distant bristles show non overlapping projections into distinct zones.

      Our analysis of postsynaptic connectivity similarity also shows somatotopic organization among the BMN postsynaptic partners, as BMN types innervating the same or neighboring bristle populations show high connectivity similarity (Figure 8, old Figure 7). Below we highlight major revisions to the text and Figures that hopefully better reveal the head somatotopy.

      (1) In the last paragraph of the Introduction we added text that explicitly frames the experiments in terms of somatotopic organization: “This reveals somatotopic organization, where BMNs innervating neighboring bristles project to the same zones in the CNS while those innervating distant bristles project to distinct zones. Analysis of the BMN postsynaptic connectome reveals that neighboring BMNs show higher connectivity similarity than distant BMNs, providing evidence of somatotopically organized postsynaptic circuit pathways.”

      (2) We mention an example of overt somatotopy from Figure 5 in the Results section titled “EM-based reconstruction of the head BMN projections in a full adult brain”. The text reads “For example, BMNs from the Eye- and LabNv have distinct ventral and anterior projections, respectively. This shows how the BMNs are somatotopically organized, as their distinct projections correspond to different bristle locations on the head (Figure 5B,C).”

      (3) In new Figure 8 (part of old Figure 7), we modified panels that correspond to the cosine similarity analysis of postsynaptic connectivity. The major revision was to plot the cosine similarity clusters onto the head bristles so that the bristles are now colored based on their clusters (C). This shows how neighboring BMNs cluster together, and therefore show similar postsynaptic connectivity. We believe that this provides a nice visualization of somatotopic organization in BMN postsynaptic connectivity. We also added the clustering dendrogram as recommended by Reviewer #2 (Figure 8A).

      (4) In new Figure 8, we added new panels (D-F) that summarize our anatomical and connectomic analysis showing different somatotopic features of the head BMNs. Different BMN types innervate bristles at neighboring and distant proximities (D). BMNs that innervate neighboring bristles project into overlapping zones (E, example of reconstructed BM-Fr and -Ant neurons with non-overlapping BM-MaPa neurons) and show postsynaptic connectivity similarity (F, example connectivity map of three BM types on cosine similarity data).

      (5) To accompany the new Figure 8D-F panels, we added a paragraph to summarize the different somatotopic features of the head BMNs that were identified based on our anatomical and connectomic analysis. This is the last paragraph in the Results section titled “Somatotopically-organized parallel BMN pathways”:

      Our results reveal head bristle proximity-based organization among the BMN projections and their postsynaptic partners to form parallel mechanosensory pathways. BMNs innervating neighboring bristles project into overlapping zones in the SEZ, whereas those innervating distant bristles project to distinct zones (example of BM-Fr, -Ant, and -MaPa neurons shown in Figure 8D,E). Cosine similarity analysis of BMN postsynaptic connectivity revealed that BMNs innervating the same bristle populations (same types) have the highest connectivity similarity. Figure 8F shows example parallel connections for BM-Fr, -Ant, and -MaPa neurons (vertical arrows), where the edge width indicates the number of synapses from each BMN type to their major postsynaptic partners. Additionally, BMNs innervating neighboring bristle populations showed postsynaptic connectivity similarity, while BMNs innervating distant bristles show little or none. For example, BM-Fr and -Ant neurons have connections to common postsynaptic partners, whereas BM-MaPa neurons show only weak connections with the main postsynaptic partners of BM-Fr or -Ant neurons (Figure 8F, connections under 5% of total BMN output omitted). These results suggest that BMN somatotopy could have different possible levels of head spatial resolution, from specific bristle populations (e.g. Ant bristles), to general head areas (e.g. dorsal head bristles).

      We also refer to Figure 8D-F to illustrate the different somatotopic features in the Discussion. These references can be found in the following Discussion sections titled “A synaptic resolution somatotopic map of the head BMNs (fourth paragraph)”, and “Parallel circuit architecture underlying the grooming sequence (second paragraph)”.

      (6) In addition to improving the Figures, we provide additional tools that enable readers to explore the BMN somatotopy in a more interactive way. That is, we provide 5 different FlyWire.ai links in the manuscript Results section that enable 3D visualization of the different reconstructed BMNs (e.g. FlyWire.ai link 1).

      Note: In working on old Figure 7 to address this Reviewer suggestion, we also reordered panels A-E. We believe that this was a more logical ordering than in the previous draft. These panels are now the only data shown in Figure 7, as the cosine similarity analysis is now in Figure 8. We hope that splitting these panels into two Figures will improve manuscript readability.

      Light EM Mapping: A better description of methods by which this mapping was done would be helpful. Perhaps the authors could provide a few example parallel representations of the EM and light images in the main figure would help the reader better appreciate the strength of their approach.

      We have done as the Reviewers suggested and added panels to Figure 6 that show examples of the LM and EM image matching (Figure 6A,B). We added two examples that used different methods for labeling the LM imaged BMNs, including MCFO labeling of an individual BM-InOc neuron and driver line labeling of a major portion of BM-InOm neurons using InOmBMN-LexA. These panels are referred to in the first paragraph of the Results section titled “Matching the reconstructed head BMNs with their bristles”. Note that examples for all LM/EM matched BMN types are shown in Figure 6 – figure supplement 2.

      We had provided Figure 6 – figure supplement 2 in the reviewed manuscript that shows all the above requested “parallel representations of the EM and light images”. However, the Reviewer critiques made us realize that the purpose of this figure supplement was not clearly indicated. Therefore, we have revised Figure 6 – figure supplement 2 and its legend to make its purpose clearer. First, we changed the legend title to better highlight its purpose. The legend is now titled: “Matching EM reconstructed BMN projections with light microscopy (LM) imaged BMNs that innervate specific bristles”. Second, we added label designations to the figure panel rows that highlight the LM and EM comparisons. That is, the rows for light microscopy images of BMNs are indicated with LM and the rows for EM reconstructed BMN images are labeled with EM. Reviewer #3 had indicated that it was not clear what labeling methods were used to visualize the LM imaged BM-InOm neurons in Figure 6 – figure supplement 2N. Therefore, we added text to the figure and the legend to better highlight the different methods used. Panels A and B were also cropped to accommodate the above mentioned revisions.

      The manuscript also provides an extensive Materials and methods section that describes the different lines of evidence that were used to assign the reconstructed BMNs as specific types. We changed the title to better highlight the purpose of this methods section to “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. The evidence used to support the assignment of the different BMN types is also summarized in Figure 6 – figure supplement 3.

      Parallel circuit model: The authors motivate their study with this. We're recommending that they define expectations of such circuitry, its alternatives (including implications for downstream pathways), and behavior before they present their results. We're also recommending that they interpret their behavioural results in the context of these circuits.

      Our primary motivation for doing the experiments described in this manuscript was to help define the neural circuit architecture underlying the parallel model that drives the Drosophila grooming sequence. This manuscript provides a comprehensive assessment of the first layer of this circuit architecture. A byproduct of this work is a contribution that offers immediate utility and significance to the Drosophila connectomics community. Namely, the description of the majority of mechanosensory neurons on the head, with their annotation in the recently released whole brain connectome dataset (FlyWire.ai). In writing this manuscript, we tried to balance both of these things, which was difficult to write. We very much appreciate the Reviewers' comments that have highlighted points of confusion in our original draft. We hope that the revised draft is now clearer and more logically presented. We have made revisions to the text and provided a new figure supplement (Figure 1 - figure supplement 1) and new panels in Figure 8. Below we highlight the major revisions.

      (1) The Introduction was revised to more explicitly ground the study in the parallel model, while also removing details that were not pertinent to the experiments presented in the manuscript.

      The first paragraph introduces different features of the parallel model. To better focus the reader on the parts of the model that were being assessed in the manuscript, we removed the following sentences: “Performance order is established by an activity gradient among parallel circuits where earlier actions have the highest activity and later actions have the lowest. A winner-take-all network selects the action with the highest activity and suppresses the others. The selected action is performed and then terminated to allow a new round of competition and selection of the next action.” Note that these sentences are included in the third and fourth paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      The first paragraph of the Introduction now introduces a bigger picture view of the model that emphasizes the two main features: 1) a parallel circuit architecture that ensures all mutually exclusive actions to be performed in sequence are simultaneously readied and competing for output, and 2) hierarchical suppression among the parallel circuits, where earlier actions suppress later actions.

      (2) Newly added Figure 1 – figure supplement 1 provides a working model of grooming (Reviewer # 1 suggestion). We now more strongly emphasize that the study aimed to define the parallel neural circuit architecture underlying the grooming sequence, focusing on the mechanosensory layer of this architecture. In particular, we refer to the new Figure 1 – figure supplement 1 that has been added to better convey the hypothesized grooming neural circuit architecture. Figure 1 – figure supplement 1 is incorporated into the Introduction (paragraphs two, three, and four), Results section titled “Somatotopically-organized parallel BMN pathways (first paragraph)”, and last Discussion section titled “Parallel circuit architecture underlying the grooming sequence (second and third paragraphs)”.

      (3) New panels in Figure 8 update the model of parallel circuit organization as it relates to somatotopy (D-F). These panels show the parallel circuits hypothesized by the model, but also indicate convergence, with different possible levels of head resolution for these circuits. We describe above where these panels are referenced in the text.

      (4) We added a new paragraph in the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence” that better incorporates the results from this manuscript into the working model of grooming. This paragraph is shown below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      Aside from this summary of major concerns, the detailed recommendations are attached below.

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the quality and exhaustive body of work presented in this manuscript. I have a few comments that the authors may want to consider:

      (1) The authors motivate this study by posing that it would allow them to uncover whether the complex grooming behaviour of flies followed a parallel model of circuit function. It would have been nice to have been introduced to what the alternative model might be and what each would mean for organisation of the circuit architecture. Some guiding schematics would go a long way in illustrating this point. Modifying the discussion along these lines would also be helpful.

      We made several revisions to the manuscript that address this recommendation. Among these revisions, we added Figure 1 – figure supplement 1 that includes a working model for grooming. Please see above for a description of these revisions.

      (2) The authors mention the body of work that has mapped head bristles and described somatotopy. It would be useful to discuss in more detail what these studies have shown and highlight where the gaps are that their study fills.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (3) The dye-fills and reconstructions that are single colour could use a boundary to demarcate the SEZ. This would help in orienting the reader.

      We agree with Reviewer #1 that Figure 4 and its supplements could use some indicator that would orient the reader with respect to the dye filled or stochastically labeled neurons. The images are of the entire SEZ in the ventral brain, and in the case of some panels, the background staining enables visualization of the brain (e.g. Figure 4H,M,N. To help orient the reader in this region, we added a dotted line to indicate the approximate SEZ midline. This also enables the reader to more clearly see which of the BMN types cross the midline.

      Midline visual guides were added for Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      (4) The comparison between the EM and the fills/clones are not obvious. And particularly because they are not directly determined, it would be nice to have the EM reconstruction alongside the dye-fills. This would work very nicely in the supplementary figure with the multiple fills of the same bristles. I think this would really drive home the point.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (5) Are there unnoticed black error-bars floating around in many of the gray-scale images?

      The black bars were masking white scale bars in the images. We have removed the black bars and remade the images without scale bars. This was done for the following Figures: Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) The only point in the paper I found myself going back and forth between methods/supp and text was when authors discuss about the clustering. I think it would help the reader if a few sentences about cosine clustering used for connectivity based clustering were included in the main text. Also, for NBLAST hierarchical clustering, it would help if some informed metrics could be used for defining cluster numbers (e.g. Braun et al, 2010 PLOS ONE shows how Ward linkage cost could be used for hierarchical clustering).

      Depending on where the cut height is placed on the dendrogram for cosine similarity of BMNs, different features of the BMN type postsynaptic connectivity are captured. As the number of clusters is increased (lower cut height), clustering is mainly among BMNs of the same type, showing that these BMNs have the highest connectivity similarity. As the number of clusters is reduced (higher cut height), BMNs innervating neighboring bristles on the head are clustered, revealing three general clusters corresponding to the dorsal, ventral, and posterior head. This reveals somatotopy based clustering among same and neighboring BMN types. The cut height shown in Figure 8 and Figure 8 – figure supplement 2 was chosen because it highlighted both of these features.

      The NBLAST clustering shows similar results to the connectivity based clustering with respect to neighboring and distant BMN types. As the number of clusters increases BMNs of the same type are clustered, and these types can be further subdivided into morphologically distinct subtypes. As the number of clusters is reduced, the clustering captures neighboring BMNs. Thus, neighboring BMN types showed high morphology similarity (and proximity) with each other, and low similarity with distant BMN types.

      Please see our responses to a Reviewer #3 critique below for further description of the clustering results.

      On the same lines it would help if the clustering dendrograms were included in the main figure.

      We thank Reviewer #2 for this comment. We have added the dendrogram to Figure 8A, a change that we feel makes this Figure much easier to understand.

      (2) It could help provide intuition if the authors revealed some of the downstream targets and their implication in explaining the behavioral phenotypes.

      While this will be the subject of at least two forthcoming manuscripts, we have added text to the present manuscript that provides insight into BMN postsynaptic targets. Our previous work (Hampel et al. 2015) described a mechanosensory connected neural circuit that elicits grooming of the antennae. While this previous study demonstrated that the Johnston’s organ mechanosensory neurons are synaptically and functionally connected with this circuit, our preliminary analysis indicates that it is also connected with BM-Ant neurons. We hypothesize that there are additional such circuits that are responsible for eliciting grooming of other head locations.

      To better highlight potential downstream targets in the manuscript, we now mention the antennal circuit in the Introduction. This text reads: In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.

      There is also text in the Discussion that addresses this Reviewer comment. It describes the antennal circuit and mentions the possibility that other similar circuits may exist. This can be found in the third paragraph of the section titled “Circuits that elicit aimed grooming of specific head locations”.

      (3) Authors find that opto activation of BMNs leads to grooming of targeted as well as neighboring areas. Is there any sequence observed here? i.e. first clean targeted area and then clean neighboring area? I wonder if the answer to this is something as simple as common post-synaptic targets which is essentially reducing the resolution of the BMN sensory map. Some more speculation on this interesting result could be helpful.

      We appreciate and agree with this point from Reviewer #2, and have tried to better emphasize the possible implications for grooming that the overlapping projections and connectivity among BMNs innervating neighboring bristles may have. This is now better addressed in the Results and Discussion sections. Below we highlight where this is addressed:

      (1) In the second paragraph of the Results section titled “Activation of subsets of head BMNs elicits aimed grooming of specific locations” we added text that suggests the possibility that grooming of the stimulated and neighboring locations could be due to the overlapping projections and connectivity. This text reads: This suggested that head BMNs elicit aimed grooming of their corresponding bristle locations, but also neighboring locations. This result is consistent with our anatomical and connectomic data indicating that BMNs innervating neighboring bristles show overlapping projections and postsynaptic connectivity similarity (see Discussion).

      (2) In the fourth paragraph of the Discussion section titled “A synaptic resolution somatotopic map of the head BMNs”, we added a sentence to the end of the fourth paragraph that alludes to further discussion of this topic. This sentence reads: This overlap may have implications for aimed grooming behavior. For example, neighboring BMNs could connect with common neural circuits to elicit grooming of overlapping locations (discussed more below).

      (3) In the fourth paragraph of the Discussion section titled “Circuits that elicit aimed grooming of specific head locations” there is a paragraph that mentions the possibility of mechanosensory convergence onto common postsynaptic circuits to promote grooming of the stimulated area, along with neighboring areas. This paragraph is below.

      We find that activation of specific BMN types elicits both aimed grooming of their corresponding bristle locations and neighboring locations. This suggests overlap in the locations that are groomed with the activation of different BMN types. Such overlap provides a means of cleaning the area surrounding the stimulus location. Interestingly, our NBLAST and cosine similarity analysis indicates that neighboring BMNs project into overlapping zones in the SEZ and show common postsynaptic connectivity. Thus, we hypothesize that neighboring BMNs connect with common neural circuits (e.g. antennal grooming circuit) to elicit overlapping aimed grooming of common head locations.

      (4) In the new second paragraph of the Discussion section titled “Parallel circuit architecture underlying the grooming sequence” we further discuss the issue of the BMN “sensory map. This paragraph is below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      (4) If authors were to include a summary table that shows all known attributes about BMN type as columns that could be very useful as a resource to the community. Table columns could include attributes like "bristle name", "nerve tract", "FlyWire IDs of all segments corresponding to the bristle class". "split-Gal4 line or known enhancer" , etc.

      We provided a table that includes much of this information after the manuscript had already gone out for review. We regret that this was not available. This is now provided as Supplementary file 3. This table provides the following information for each reconstructed BMN: BMN name, bristle type, nerve, flywire ID, flywire coordinates, NBLAST cluster (cut height 1), NBLAST cluster (cut height 5), and cosine cluster (cut height 4.5). Note that the driver line enhancers for targeting specific BMN types are shown in Figure 3I.

      Specific Points:

      Figure 4C-V:

      • I find it a bit difficult to distinguish ipsi- from contra-lateral projections. Maybe indicate the midline as a thin, stippled line?

      We thank the Reviewer #2 for this suggestion. We have now added lines in the panels in Figure 4C-V to indicate the approximate location of the midline. We also added lines to the Figure 4 – figure supplements as described above.

      I think this Fig reference is wrong "the red-light stimulus also elicited backward motions with control flies (Figure 6B,C, control, black trace, Video 5)." should be Fig 8B,C

      We have fixed this error.

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      Motivating this study in terms of understanding the neural mechanisms that execute the parallel model seems to overstate what you will achieve with the current study. If you want to motivate it this way, I suggest focusing on the grooming sequence of the head along (eyes, antennae, proboscis).

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that many of the revisions focus on the head grooming sequence. We also made minor revisions to the Introduction that further emphasize the focus on head grooming.

      Results:

      Figure 1. Please indicate that this is a male fly in either the figure title or in the figure itself.

      We added a male symbol to Figure 1A.

      Figure 3. Panel J is referenced in the main body text and in the figure caption, but there is no Fig 3J.

      Panel J is shown in the upper right corner of Figure 3. We realize that the placement of this panel is not ideal, but this was the only place that we could fit it. Additionally, the panel works nicely at that location to better enable comparison with panel C. We have revised the text in the Figure 3 legend to better highlight the location of this Figure panel: “Shown in the upper right corner of the figure are the aligned expression patterns of InOmBMN-LexA (red), dBMN-spGAL4 (green), and TasteBMN-spGAL4 (brown).”

      We also added text to a sentence in the results section entitled “Head BMNs project into discrete zones in the ventral brain” that indicates the panel location. This text reads: To further visualize the spatial relationships between these projections, we computationally aligned the expression patterns of the different driver lines into the same brain space (Figure 3J, upper right corner).

      Matching the BMNs to EM reconstructions: why cut the dendrogram at H=5? Would be better to determine cluster number using an unbiased method.

      To match the morphologically distinct EM reconstructed BMNs to their specific bristles, we relied on different lines of evidence, including NBLAST results (discussed more below), dye fill/stochastic labeling/driver line labeling matches, published morphology, nerve projection, bristle number, proximity to other BMNs, and postsynaptic connectivity (summarized in Figure 6 – figure supplement 3). The following Materials and methods section provides a detailed description of the evidence used to assign each BMN type in “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. In many cases, BMN type could be assigned with confidence solely based on morphological comparisons with our light level data (e.g. dye fills), in conjunction with bristle counts to indicate an expected number of BMNs showing similar morphology. Thus, the LM/EM matches and NBLAST clustering were largely complementary.

      The EM reconstructed BMNs were matched as particular BMN types, in part based on examination of the NBLAST data at different cut heights. NBLAST clustering of the BMNs revealed general trends at higher and lower cut heights (Figure 6 – figure supplement 1A, Supplementary file 3). The lowest cut heights included mostly BMNs of the same type innervating the same bristle populations, and smaller clusters that subdivided into morphologically distinct subtypes (see Supplementary file 3 for clusters produced at cut height 1). This revealed that BMNs of the same type tended to show the highest morphological similarity with each other, but they also showed intratype morphological diversity. Higher cut heights produced clusters of BMNs innervating neighboring bristles populations (e.g. ventral head BMNs), showing high morphological similarity among neighboring BMN types.

      We selected the cut height 5 shown in Figure 6 – figure supplement 1A,B because it captures examples of both same and neighboring type clustering. For example, it captures a cluster of mostly BM-Taste neurons (Cluster 16), and neighboring BMN types, including those from the dorsal head (Cluster 14) or ventral head (Cluster 15).

      Based on reviewer comments, we realized that the way we wrote the BMN matching section in the Results indicated more reliance on the NBLAST clustering than what was actually necessary, distorting the way we actually matched the BMNs. Therefore, we softend the first couple of sentences to place less emphasis on the importance of the NBLAST. We also indicated that the readers can find the resulting clusters at different cut heights, referring to Figure 6 – figure supplement 1A and Supplementary file 3. The first two sentences of the first paragraph in the Results section titled “Matching the reconstructed head BMNs with their bristles” now read:

      The reconstructed BMN projections were next matched with their specific bristle populations. The projections were clustered based on morphological similarity using the NBLAST algorithm (example clustering at cut height 5 shown in Figure 6 – figure supplement 1A,B, Supplementary file 3, FlyWire.ai link 2) (Costa et al., 2016). Clusters could be assigned as BMN types based on their similarity to light microscopy images of BMNs known to innervate specific bristles.

      The number of reconstructed BMNs is remarkably similar to what is expected based on bristle counts for each group except for lnOm. Why do you think there is such a large discrepancy there?

      We believe that there is a discrepancy between the number of reconstructed BM-InOm neurons and the number expected based on InOm bristle counts because these bristle counts were based on few flies and these numbers appear to be variable. We did not further investigate the numbers of InOm bristles in this manuscript because we only needed an estimate of their numbers, given that there is over an order of magnitude difference in the eye bristles versus any other head bristle population. Therefore, we could relatively easily conclude that the head BMNs were related to the InOm bristles, based on their sheer numbers and their morphology.

      Figure 6 - figure supplement 2N, please describe these panels better. Main text says the upper image is from lnOmBMN-LexA, but the figure legend doesn't agree.

      We have added text to the figure legend that now makes the contents of panel 2N clear to the reader. Further, we now indicate in the figure legend for each panel, the method used to obtain the labeled neurons (i.e. fill, MCFO, driver), to avoid similar confusion for the other panels.

      Figure 6 - figure supplement 4D. How frequently is there a mismatch between the number of BMNs for a given type across hemispheres?

      Although the full reconstruction of the BMNs on both sides of the brain was beyond the scope of this work, the BMNs on both sides have since been reconstructed and annotated (Schlegal et al. 2023). We plan to provide more analysis of BMNs on both sides of the brain in a forthcoming manuscript. However, the BMN numbers tend to show agreement on both sides of the brain. The table below shows a comparison between the two sides:

      Author response table 1.

      Figures 6 and 7. It would be helpful to include a reference brain in all panels that show cluster morphology. Without landmarks there is nothing to anchor the eye to allow the reader to see the described differences in BMN projection zones and patterns.

      While we apologize for not making this specific change, we have made revisions to other parts of the manuscript to better highlight the somatotopic organization among the BMNs (revisions described above). Please note that we now provide FlyWire.ai publicly available links that enable readers to view the BMN projections in 3D. They can also toggle a brain mesh on and off to provide spatial reference.

      "BMN somatotopic map": It would be helpful to show or describe in more detail what the unique branch morphology for each zone is. It is quite difficult to appreciate, as the groups also have a lot of overlap. Would the unique regions that the BMN groups innervate be easier to see if you plotted presynaptic sites by group? I am left unsure about whether there is a somatotopic map here.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that we did not examine the fine branch morphological differences between BMN types having overlapping projections. Showing these differences would require more extensive anatomical analysis that is beyond the scope of this work. For showing definitive somatotopy, we focused on the overt differences between BMNs innervating bristles at distant locations on the head.

      Overall the strict adherence to the parallel model impacts the interpretation of the data. It would be helpful for the authors to discuss which aspects of the current study are consistent with the parallel model and which results are not consistent.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      Discussion:

      "Circuits that elicit aimed grooming of specific head locations": In the previous paragraph you mention "BMN types innervating neighboring bristle populations have overlapping projections into zones that correspond roughly to the dorsal, ventral, and posterior head. The overlap is likely functionally significant, as cosine similarity analysis revealed that neighboring head BMN types have common postsynaptic partners. However, overlap between neighboring BMN types is only partial, as they show differing projections and postsynaptic connectivity." Then in this paragraph, you say, "How do the parallel-projecting head BMNs interface with postsynaptic neural circuits to elicit aimed grooming of specific head locations? Different evidence supports the hypothesis that the BMNs connect with parallel circuits that each elicit a different aimed grooming movement (Seeds et al., 2014)." The overlapping postsynaptic BMN connectivity seems in conflict with the claim that the circuits are parallel.

      We apologize for this confusion. We now better describe this apparent discrepancy between our results and the parallel model of grooming behavior. We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      We have made additional changes to the manuscript:

      (1) We added Supplementary file 2 that includes links for downloading the image stacks used to generate panels in Figure 1, Figure 2, Figure 3, Figure 4, and figure supplements for these figures. These image stacks are stored in the Brain Image Library (BIL). Rows in the spreadsheet correspond to each image stack. Columns provide information about each stack including: figure panels that each image stack contributed to, image stack title, DOI for each stack (link provides metadata for each stack and file download link), image stack file name, genotype of imaged fly, and information about image stack. References to this file have been made at different locations throughout the text and Figure legends. We also added a section on the BIL data in the Materials and methods entitled “Light microscopy image stack storage and availability”. Old Supplementary file 2 has been renamed Supplementary file 3.

      (2) We added a new reference for FlyWire.ai (Dorkenwald et al. 2023) that was posted as a preprint during the revision of this manuscript.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      • Is the coronal slice in Figure 2 the corresponding mid-coronal plane to compute Dice scores? If so, the authors could mention it so that readers have an idea where the selected slice is.

      This is indeed a good point. The coronal slice in Figure 2 is not part of the set of slices that we used to compute Dice scores. Showing such a slice is important, so we have added a small figure to the appendix with one of these slices, along with the corresponding automated segmentations.

      • SIFT descriptors were adopted to detect fiducials only. Maybe it could also be applied to align stacked photographs of brain slices.

      While SIFT is robust against changes in pose (e.g., object rotation), perspective, and lightning, it is not robust against changes in the object itself – such as changes between one slice to the next, as is the case in our work. We have added a sentence to the methods section clarifying this issue.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Weaknesses:

      Start site fidelity in purified recons5tuted systems can be drama5cally altered in different buffer condi5ons. Interpreta5on of the observed changes to start site selec5on in mRNAs in the absence or presence of Ded1 using only the one buffer condi5on used is therefore limited.

      This is an excellent point and is something we could explore in future studies using the Rec-Seq system. We have added this caveat to the Discussion on lines 797-809. We have previously studied the fidelity of start codon recogni>on in the recons>tuted system (Kolitz et al., [2009] RNA, 15:138-152) and found that under our standard buffer condi>ons the codon specificity generally reflects what we observed in vivo using a dual-luciferase reporter assay, with the most stable 48S complexes forming on AUG codons, followed by first posi>on mismatches (GUG, UUG, CUG), with second and third posi>on mismatches leading to significantly less stable complexes. However, as the reviewer notes, there are some devia>ons: ACG and AUA are poor codons in the in vitro system under the buffer condi>ons used but allowed rela>vely strong expression in our in vivo reporter assay. It should also be noted that the hierarchy of nearcognate start codon usage in vivo in yeast differs according to the study and the reporter used, making it difficult to establish a “ground truth” for start codon fidelity.

      I have some specific comments to strengthen the manuscript and address some minor issues.

      It is not clear to me whether the authors refold the purified mRNA aEer phenol/chloroform extrac5on? Have the authors observed different results if the mRNA is refolded or not? This is appropriate since the authors compare their Rec-Seq data to PARS scores that were generated from refolded mRNAs. One assumes that the total mRNA used is refolded in the same way as the PARS score study, but this is not clearly stated. The authors should make this point clear in the text and methods.

      This is an excellent point. We did not use the final refolding protocol that Kertesz et al. used when they developed their PARS scores and now clarify this in the Methods sec>on (lines 962967). It is possible that we would have seen stronger correla>ons in the analyses using PARS scores had we followed the renatura>on protocol, although the fact that we observed significant correla>ons (e.g., Fig. 3E-H) suggests the structures in the Kertesz et al. mRNAs were similar to those in our mRNAs.

      It is not clear how the authors determine the concentra5on of total mRNA that is used in the assay - reported as 60 nM? Are the authors assuming a molecular weight of an average mRNA to determine the concentra5on? The authors should provide more detail for how they quan5fy their mRNA concentra5on and its stoichiometry compared to 43S PICs.

      We thank the reviewer for poin>ng out this oversight and have now included this informa>on on lines 849-855 of the Methods sec>on.

      Comments regarding start site fidelity in the recons5tuted system:

      The authors use in vitro transcribed tRNAi-Met. Since tRNA modifica5ons may play a role in start site fidelity, the authors should perhaps men5on that this will need to be inves5gated in a future study in the discussion.

      This is a good point and we now note it as a caveat in the Discussion on lines 806-809.

      The authors state that Ded1 promotes leaky scanning regardless of the mAUG start site context (page 24; lines 533-534). The authors then state on page 25 that the level of iAUG ini5a5on rela5ve to mAUG ini5a5on does depend on the mAUG context (lines 545-546). This seems contradictory unless I am not understanding this correctly? It would certainly be surprising that mAUG context didn't regulate leaky scanning in the recons5tuted system given the fact that ini5a5on codon context regulates selec5on in cells (when Ded1 is present).

      These statements are correct as wrihen. As shown in Figure 5O, the frequency of leaky scanning (as measured by rela>ve ribosome occupancy of the internal region of the ORF, not including the main start codon, to the whole ORF, including the main start codon; RRO) decreases as the context score around the start codon gets stronger (green and purple lines). The RRO is increased to the same extent when 500 nM Ded1 is added, regardless of the strength of the start codon context, indica>ng that Ded1 enhances leaky scanning equally (compare slopes of the green line without Ded1 to the purple line with Ded1). Because of this, the effect of Ded1 on RRO (DRR0) is constant across context score bins (orange line). There is no discrepancy between our two conclusions that leaky scanning of the mAUG increases as context score decreases and that Ded1 increases leaky scanning equally for good and bad mAUG contexts, indica>ng that Ded1 does not inspect the mAUG context and simply decreases the dwell >me equally at all contexts.

      Further to the start site context ques5on. It is possible that the fidelity of the recons5tuted system (i.e. buffer condi5ons) is not fully reflec5ng in vivo-like start site selec5on. A rigorous characteriza5on of commercially available re5culocyte lysate systems iden5fied buffer condi5ons that provided similar start site fidelity to that observed in live cells (Kozak. Nucleic Acids Res. 1990 May 11;18(9):2828). While I feel that it is beyond the context of the current work to undertake a similar rigorous buffer characteriza5on, one must be careful about interpre5ng the results about leaky scanning and upstream ini5a5on sites in the current work. Perhaps one would observe similar results to Guenther et al. if the fidelity (buffer condi5ons) of the recons5tuted system were different? I appreciate that the authors state that their results only apply to their recons5tuted system and do not necessarily suggest that previous data are incorrect, but with only one buffer condi5on being tested in the current study it may be appropriate to further soEen the interpreta5on of the current results when compared to published data in live cells.

      This point is well-taken. As noted above, we have added a caveat about possible effects of buffer condi>ons on start codon fidelity to the Discussion (lines 797-809). In terms of the possibility that upstream ini>a>on is more frequent in vivo than we observe in the in vitro RecSeq system, we previously studied 5’UTR transla>on in vivo using ribosome profiling (Kulkarni et al. [2019] BMC Biol., 17:101). The ra>o of RPFs in 5’UTRs to coding sequences in this study was 0.0027, very similar to the value measured in the in vitro Rec-Seq system in the presence of Ded1 (0.0016-0.0017). Thus, it does not seem that the frequency of upstream ini>a>on is drama>cally higher in vivo than in our in vitro system. We have now made note of this point in the Results (lines 594-598). Guenther et al. employed a ribosome profiling protocol in which they added cycloheximide to their cells prior to lysis, which has been shown to create significant ar>facts, par>cularly in 5’UTR transla>on (e.g., Gerashchenko and Gladyshev [2014] Nucleic Acids Res., 42:e134). Nevertheless, as suggested by the reviewer, we have modified the text in the Results and Discussion to somen the interpreta>on somewhat (lines 582-583; 616-618; 761763).

      Reviewer #2

      Weaknesses:

      Several findings in this report are quite surprising and may require addi5onal work to fully interpret. Primary among these is the finding that Ded1p s5mulates accumula5on of PICs at internal site in mRNA coding sequences at an incidence of up to ~50%. The physiological relevance of this is unclear.

      We agree with the reviewer that understanding the physiological significance, if any, of the apparent leaky scanning of main AUG start codons induced by Ded1 is an unanswered ques>on that will require addi>onal studies. It is possible that rapid 60S subunit joining and forma>on of the 80S ini>a>on complex amer start codon recogni>on on most mRNAs reduces the leaky scanning effect in vivo. We now bring up this possibility in the Discussion sec>on (lines 804809). However, as noted in lines 568-580, mRNAs that display significantly decreased mRPFs at 500 nM Ded1 in the Rec-Seq system also tend to have TEs that are increased in the ded1-cs- mutant rela>ve to WT yeast in in vivo ribosome profiling experiments, sugges>ng that Ded1 ac>vity also diminishes ini>a>on on mAUG codons in these mRNAs in vivo.

      A limita5on of the methodology is that, as an endpoint assay, Rec-Seq does not readily decouple effects of Ded1p on PIC-mRNA loading from those on the subsequent scanning step where the PIC locates the start codon. Considering that Ded1p ac5vity may influence each of these ini5a5on steps through dis5nct mechanisms - i.e., binding to the mRNA cap-recogni5on factor eIF4F, or direct mRNA interac5on outside eIF4F - addi5onal studies may be needed to gain deeper mechanis5c insights.

      We agree that this is a limita>on of the Rec-Seq assay and now men>on this point in the Discussion sec>on (lines 810-817). It is possible that future work using cross-linking agents to stabilize 43S complexes bound near the cap and scanning the 5’UTR, similar to the methodology used in 40S ribosome profiling, could enable us or others to disentangle these steps from one another.

      As the authors note, the achievable Ded1p concentra5ons in Rec-Seq may mask poten5al effects of Ded1p-based granule forma5on on transla5on ini5a5on. Addi5onal factors present in the cell could poten5ally also promote this mechanism. Consequently, the results do not fully rule out granule forma5on as a poten5al parallel Ded1p-mediated transla5on-inhibitory mechanism in cells.

      We agree. As stated in the Discussion sec>on (lines 735-741): “It is possible that at higher concentra>ons of Ded1 than were achievable in these in vitro experiments or in the presence of addi>onal factors that modify Ded1’s ATPase or RNA binding ac>vi>es the factor could directly inhibit a subset of mRNAs, by ac>ng as an mRNA clamp that impedes scanning by the PIC, or by sequestering the mRNAs in insoluble condensates. It might be interes>ng in the future to test candidate factors in Rec-Seq to determine if they switch Ded1 from being a s>mulatory helicase to an inhibitory mRNA clamp that removes transcripts from the soluble phase.”

      It is certainly clear why the 15-minute 5mepoint was chosen for these assays. However, I wondered whether data from an earlier 5mepoint would provide useful informa5on. The descrip5on on line 210 of the compiled PDF suggests data from different 5mepoints may be available; if it is, in my view it could be a useful addi5on. More generally, including language about the single-turnover nature of these reac5ons may be helpful for the benefit of a broad audience.

      In preliminary experiments, we have used the Rec-Seq system to measure the kine>cs of 48S PIC forma>on transcriptome-wide. As you probably can imagine, this is a challenging experiment and requires addi>onal work before we would feel comfortable publishing it. We very much agree with the reviewer that resolving the kine>cs of these events will provide important addi>onal informa>on. As suggested, we have added caveats about the endpoint and single-turnover nature of the assay to the Discussion (lines 821-828).

      I wondered whether it might be useful to present addi5onal informa5on on the mRNAs not found in the assay. For example, are these the least abundant mRNAs, which may not have had 5me to recruit the 43S PIC?

      75% of mRNAs (2719 of 3640) not observed in the Rec-Seq analysis had densi>es below the median (2.3 reads per nucleo>de). We now men>on this in the Methods sec>on (lines 855856).

      The Rec-Seq recruitment reac5ons were carried out at 22C˚ . Considering that remodeling of RNA structure by helicase enzymes is a focal point of the study, linking the results to the recruitment landscape at a closer-to-physiological temperature may bolster the conclusions.

      In the future, it would be interes>ng to test the effects of temperature on 48S PIC forma>on using the Rec-Seq system. As the reviewer suggests, the interplay between temperature and mRNA structure could reveal interes>ng phenomenon. It is worth no>ng, however, that there is no clear “physiological” temperature for S. cerevisiae. For consistency and convenience, lab yeast is usually grown at 30 ˚C, but in the wild yeast live at a wide range of temperatures, which generally change throughout the day. From this standpoint, 22 ˚C seems reasonably physiological.

      Results from Rec-seq experiments conducted at 15° C might be more directly comparable to in vivo Ribo-seq data with the ded1-cs mutant. However, already ~90% of the Ded1hyperdependent mRNAs iden>fied by Ribo-seq analysis of that mutant were iden>fied here as Ded1-s>mulated mRNAs in Rec-Seq experiments at 22°C. The Ribo-seq experiments conducted by Guenther et al. were conducted on the ded1-ts mutant at 37°C; thus, any structures that confer Ded1-dependent leaky-scanning through uORFs detected in that study should have been stable in our Rec-Seq experiments.

      The introduc5on provides an important, detailed exposi5on of the state of the field with respect to Ded1p ac5vity. Nevertheless, in my view, it is quite lengthy and could be streamlined for clarity. As just one example, the proposed func5on of Ded1p in the nucleus seems like a detail that could be dispensed with for the present work.

      We have ahempted to shorten the Introduc>on, as suggested. However, we did not remove the short sec>on describing Ded1’s possible roles in the nucleus and ribosome biogenesis because we felt it was important to emphasize that one of the strengths of the Rec-Seq system is that it allows us to isolate the early steps of transla>on ini>a>on from later steps and from other cellular processes. In addi>on, at the sugges>on of Reviewer #3, we added a brief explana>on of Ded1’s possible role in the subunit joining step of transla>on.

      Reviewer #3

      Weaknesses:

      The slow nature of the biochemical experiments could bias results.

      We agree that the 15-minute >me point used could mask effects that are manifested at a purely kine>c level. It should be noted that we have measured the observed rate constants for 48S forma>on on a variety of mRNAs in the in vitro recons>tuted system in the presence of satura>ng Ded1 (Gupta et al. [2018] eLife, hhps://elifesciences.org/ar>cles/38892 ) and found that they are generally in the range of es>mates of rate constants for transla>on ini>a>on in vivo in yeast (~1-10 min-1; e.g., Siwiak and Zielenkiewicz [2010], PLOS Comput. Biol., 6: e100865). In preliminary experiments, we have used the Rec-Seq system to measure the kine>cs of 48S PIC forma>on transcriptome-wide in the absence of Ded1 and find that the mean rate constant observed (~2 min-1) is also within the range of es>mates of the rate of transla>on ini>a>on in vivo in yeast. We hope to publish this analysis in a future manuscript.

      It has been suggested that Ded1 and its human homolog DDX3X could play a role in subunit joining postscanning (Wang et al. 2022, Cell and Geissler et al. 2012 Nucleic Acids Res). Could the authors poten5ally inves5gate this by adding GTP, eIF5B and 60S subunits into the reac5on mixture and isola5ng 80S complexes?

      This is a very interes>ng sugges>on. One of our plans with the Rec-Seq system is to see if we can also observe 80S forma>on with it and dis>nguish 80S from 48S complexes. Although we haven’t yet tried this and there might be technical obstacles to doing it, if it works we would like to examine the poten>al effects of Ded1, as suggested. We now men>on this possibility in the Discussion sec>on (lines 709-716 and 810-817).

      An incuba5on 5me of 15 minutes is quite long on the 5mescale of transla5on ini5a5on. Presumably, the compe55on for 40S among mRNAs is par5ally kine5cally controlled so it would be interes5ng if the authors could do a 5me series on the incuba5on 5me. Does Ded1 increase ini5a5on on more structured UTRs even at shorter incuba5ons or are those only observed with longer incuba5ons?

      We agree. See the response to the ques5on about kine5cs above.

      Does GDPNP lead to off-pathway events? What happens when GTP is used in the TC? Presumably in the absence of eIF5B the 48S PIC should remain stalled at the start codon.

      In previous experiments in the recons>tuted system, we showed that using GTP instead of GDPNP resulted in 48S complexes that were less stable than those stalled prior to GTP hydrolysis (e.g., Algire et al. [2002] RNA 8:382-397). This is presumably because eIF2•GDP and eIF5 release from the complex and the Met-tRNAi can dissociate in the absence of subunit joining. Although we haven’t tried it in the Rec-Seq system, we suspect that the resul>ng PICs would fall apart during sucrose gradient sedimenta>on.

      The authors use assembly of a 48S PIC at the start codon as evidence of scanning but could use more evidence to back this claim up. Does removing the cap structure on the two luciferase mRNA controls disrupt ini5a5on using this approach? That would be direct evidence of 5' end 40S loading and scanning to the start codon.

      In previous work using the recons>tuted system, we studied the effect of the 5’-cap on 48S PIC forma>on (Mitchell et al. [2010] Mol. Cell 39:950-962; Yourik et al. [2017] eLife hhps://elifesciences.org/ar>cles/31476 ). We found that stable 48S PIC forma>on is strongly dependent on the presence of the 5’-cap. In addi>on, the cap prevents off-pathway events and enforces a requirement for the full set of ini>a>on factors to achieve efficient 48S PIC forma>on. As the reviewer indicates, the cap-dependence of the system supports the conclusion that 5’end loading and scanning take place. We have now added this informa>on and the relevant cita>ons to the Introduc>on (lines 147-153). We thank the reviewer for poin>ng out this oversight. It should also be noted that the cases of mRNAs in which 5’UTR transla>on is increased by addi>on of Ded1 support the conclusion that the factor promotes ahachment of the PIC to the 5’ ends of mRNAs and subsequent 5’ to 3’ scanning, as noted in lines 608-618.

      The authors state that "The correla5on between CDS length and RE could be indirect because CDS length also correlates with 5'UTR length". Could the authors bin the transcripts into different 5' UTR length ranges and then probe for CDS length differences on RE for each 5' UTR length bin? This could be useful to truly parse the mechanism by which CDS length is influencing RE.

      This was an excellent sugges>on. We now include this analysis in a new supplementary figure, Figure 3S-2. Corresponding text was added in lines 380-387:

      “Importantly, correlations between Ded1 stimulation and 5’ UTR lengths are evident for all three groups of mRNAs containing distinct ranges of CDS lengths (Fig. 3-S2A-C). In contrast, a marked correlation between Ded1 stimulation and CDS length was detected only for the group of mRNAs with longest 5’UTRs (Fig. 3-S2D-F), and only the latter group showed a clear correlation between 5’UTR length and CDS length (Fig. 3-S2G-I). Thus, the correlation between Ded1 stimulation and CDS length appears to be indirect, driven by the tendency for the mRNAs with the longest 5’UTRs to also have correspondingly longer CDSs.”

      We thank the reviewer for this very useful idea.

      In Figure 3I, why does RE dip for the middle bins of CDS length in both 100 nM and 500 nM condi5ons, and then rise back up for the later bins? In other words, why do the shortest and longest CDS have the best RE in the presence of ded1?

      We do not know the reason for this dip and now say this in the Results on lines 377-378.

      The discussion sec5on would be well served to discuss proposed roles of Ded1 post-scanning and how those fit, if at all, with the data presented throughout the manuscript.

      We have now added this to the Discussion (lines 709-716 and 810-817). We thank the reviewer for poin>ng out this oversight.

      Minor comments:

      • Define bins on figures rather than using bin number for axis labels. For example, Figure 3A-D x-axis labels indicate the length range of each bin.

      Thank you for the sugges>on. We have made this change.

      • Figure 3I: the data seem to indicate that shortest CDSs have a ded1 dependency similar to the longest CDSs. This result seems inconsistent with the given rela5onship between UTR length, structure, CDS length. Please clarify.

      See answer to this ques>on above.

      • Replace qualita5ve statements, such as "substan5ally smaller reduc5ons" with percent change, numbers, etc.

      We have tried to replace qualita>ve statements with quan>ta>ve ones, where possible.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies the mitotic localization mechanism for Aurora B and INCENP (parts of the chromosomal passenger complex, CPC) in Trypanosoma brucei. The mechanism is different from that in the more commonly studied opisthokonts and there is solid support from RNAi and imaging experiments, targeted mutations, immunoprecipitations with crosslinking/mass spec, and AlphaFold interaction predictions. The results could be strengthened by biochemically testing proposed direct interactions and demonstrating that the targeting protein KIN-A is a motor. The findings will be of interest to parasitology researchers as well as cell biologists working on mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. Please note that the conserved glycine residue in the Switch II helix in KIN-A was mistakenly labelled as G209 in the original manuscript. We now corrected it to G210 in the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The CPC plays multiple essential roles in mitosis such as kinetochore-microtubule attachment regulation, kinetochore assembly, spindle assembly checkpoint activation, anaphase spindle stabilization, cytokinesis, and nuclear envelope formation, as it dynamically changes its mitotic localization: it is enriched at inner centromeres from prophase to metaphase but it is relocalized at the spindle midzone in anaphase. The business end of the CPC is Aurora B and its allosteric activation module IN-box, which is located at the C-terminal part of INCENP. In most well-studied eukaryotic species, Aurora B activity is locally controlled by the localization module of the CPC, Survivin, Borealin, and the N-terminal portion of INCENP. Survivin and Borealin, which bind the N terminus of INCENP, recognize histone residues that are specifically phosphorylated in mitosis, while anaphase spindle midzone localization is supported by the direct microtubule-binding capacity of the SAH (single alpha helix) domain of INCENP and other microtubule-binding proteins that specifically interact with INCENP during anaphase, which are under the regulation of CDK activity. One of these examples includes the kinesin-like protein MKLP2 in vertebrates.

      Trypanosoma is an evolutionarily interesting species to study mitosis since its kinetochore and centromere proteins do not show any similarity to other major branches of eukaryotes, while orthologs of Aurora B and INCENP have been identified. Combining molecular genetics, imaging, biochemistry, cross-linking IP-MS (IP-CLMS), and structural modeling, this manuscript reveals that two orphan kinesin-like proteins KIN-A and KIN-B act as localization modules of the CPC in Trypanosoma brucei. The IP-CLMS, AlphaFold2 structural predictions, and domain deletion analysis support the idea that (1) KIN-A and KIN-B form a heterodimer via their coiled-coil domain, (2) Two alpha helices of INCENP interact with the coiled-coil of the KIN-A-KIN-B heterodimer, (3) the conserved KIN-A C-terminal CD1 interacts with the heterodimeric KKT9-KKT11 complex, which is a submodule of the KKT7-KKT8 kinetochore complex unique to Trypanosoma, (4) KIN-A and KIN-B coiled-coil domains and the KKT7-KKT8 complex are required for CPC localization at the centromere, (5) CD1 and CD2 domains of KIN-A support its centromere localization. The authors further show that the ATPase activity of KIN-A is critical for spindle midzone enrichment of the CPC. The imaging data of the KIN-A rigor mutant suggest that dynamic KIN-A-microtubule interaction is required for metaphase alignment of the kinetochores and proliferation. Overall, the study reveals novel pathways of CPC localization regulation via KIN-A and KIN-B by multiple complementary approaches.

      Strengths:

      The major conclusion is collectively supported by multiple approaches, combining site-specific genome engineering, epistasis analysis of cellular localization, AlphaFold2 structure prediction of protein complexes, IP-CLMS, and biochemical reconstitution (the complex of KKT8, KKT9, KKT11, and KKT12).

      We thank the reviewer for her/his positive assessment of our manuscript.

      Weaknesses:

      • The predictions of direct interactions (e.g. INCENP with KIN-A/KIN-B, or KIN-A with KKT9-KKT11) have not yet been confirmed experimentally, e.g. by domain mutagenesis and interaction studies.

      Thank you for this point. It is true that we do not have evidence for direct interactions between KIN-A with KKT9-KKT11. However, the interaction between INCENP with KIN-A/KIN-B is strongly supported by our cross-linking IP-MS of native complexes. Furthermore, we show that deletion of the INCENPCPC1 N-terminus predicted to interact with KIN-A:KIN-B abolishes kinetochore localization.

      • The criteria used to judge a failure of localization are not clearly explained (e.g., Figure 5F, G).

      As suggested by the reviewer in recommendation #14, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      • It remains to be shown that KIN-A has motor activity.

      We thank the reviewer for this important comment. Indeed, motor activity remains to demonstrated using an in vitro system, which is beyond the scope of this study. What we show here is that the motor domain of KIN-A effectively co-sediments with microtubules and that spindle localization of KIN-A is abolished upon deletion of the motor domain. Moreover, mutation of a conserved Glycine residue in the Switch II region (G210) to Alanine (‘rigor mutation’, (Rice et al., 1999)), renders KIN-A incapable of translocating to the central spindle, suggesting that its ATPase activity is required for this process. To clarify this point in the manuscript, we have replaced all instances, where we refer to ‘motor activity’ of KIN-A with ‘ATPase activity’ when referring to experiments performed using the KIN-A rigor mutant. In addition, we have included a Multiple Sequence Alignment (MSA) of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9 in Figure 6A and S6A, showing the conservation of key motifs required for ATP coordination and tubulin interaction. In the corresponding paragraph in the main text, we describe these data as follows:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      • The authors imply that KIN-A, but not KIN-B, interacts with microtubules based on microtubule pelleting assay (Fig. S6), but the substantial insoluble fractions of 6HIS-KINA and 6HIS-KIN-B make it difficult to conclusively interpret the data. It is possible that these two proteins are not stable unless they form a heterodimer.

      This is indeed a possibility. We are currently aiming at purifying full-length recombinant KIN-A and KIN-B (along with the other CPC components), which will allow us to perform in vitro interaction studies and to investigate biochemical properties of this complex (including the role of the motor domains of KIN-A and KIN-B) within the framework of an in-depth follow-up study. To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      • For broader context, some prior findings should be introduced, e.g. on the importance of the microtubule-binding capacity of the INCENP SAH domain and its regulation by mitotic phosphorylation (PMID 8408220, 26175154, 26166576, 28314740, 28314741, 21727193), since KIN-A and KIN-B may substitute for the function of the SAH domain.

      We have modified the introduction to include the following text and references mentioned by the reviewer: ‘The localization module comprises Borealin, Survivin and the N-terminus of INCENP, which are connected to one another via a three-helical bundle (Jeyaprakash et al., 2007, 2011; Klein et al., 2006). The two modules are linked by the central region of INCENP, composed of an intrinsically disordered domain and a single alpha helical (SAH) domain. INCENP harbours microtubule-binding domains within the N-terminus and the central SAH domain, which play key roles for CPC localization and function (Samejima et al., 2015; Kang et al., 2001; Noujaim et al., 2014; Cormier et al., 2013; Wheatley et al., 2001; Nakajima et al., 2011; Fink et al., 2017; Wheelock et al., 2017; van der Horst et al., 2015; Mackay et al., 1993).’

      Reviewer #2 (Public Review):

      How the chromosomal passenger complex (CPC) and its subunit Aurora B kinase regulate kinetochore-microtubule attachment, and how the CPC relocates from kinetochores to the spindle midzone as a cell transitions from metaphase to anaphase are questions of great interest. In this study, Ballmer and Akiyoshi take a deep dive into the CPC in T. brucei, a kinetoplastid parasite with a kinetochore composition that varies greatly from other organisms.

      Using a combination of approaches, most importantly in silico protein predictions using alphafold multimer and light microscopy in dividing T. brucei, the authors convincingly present and analyse the composition of the T. brucei CPC. This includes the identification of KIN-A and KIN-B, proteins of the kinesin family, as targeting subunits of the CPC. This is a clear advancement over earlier work, for example by Li and colleagues in 2008. The involvement of KIN-A and KIN-B is of particular interest, as it provides a clue for the (re)localization of the CPC during the cell cycle. The evolutionary perspective makes the paper potentially interesting for a wide audience of cell biologists, a point that the authors bring across properly in the title, the abstract, and their discussion.

      The evolutionary twist of the paper would be strengthened 'experimentally' by predictions of the structure of the CPC beyond T. brucei. Depending on how far the authors can extend their in-silico analysis, it would be of interest to discuss a) available/predicted CPC structures in well-studied organisms and b) structural predictions in other euglenozoa. What are the general structural properties of the CPC (e.g. flexible linkers, overall dimensions, structural differences when subunits are missing etc.)? How common is the involvement of kinesin-like proteins? In line with this, it would be good to display the figure currently shown as S1D (or similar) as a main panel.

      We thank the reviewer for her/his encouraging assessment of our manuscript and the appreciation on the extent of the evolutionary relevance of our work. As suggested, we have moved the phylogenetic tree previously shown in Fig. S1D to the main Fig. 1F. Our AF2 analysis of CPC proteins and (sub)complexes from other kinetoplastids failed to predict reliable interactions among CPC proteins except for that between Aurora B and the IN box. It therefore remains unclear whether CPC structures are conserved among kinetoplastids. Because components of CPC remain unknown in other euglenozoa (other than Aurora B and INCENP), we cannot perform structural predictions of CPC in diplonemids or euglenids.

      It remains unclear how common the involvement of kinesin-like proteins with the CPC is in other eukaryotes, partly because we could not identify an obvious homolog of KIN-A/KIN-B outside of kinetoplastids. Addressing this question would require experimental approaches in various eukaryotes (e.g. immunoprecipitation and mass spectrometry of Aurora B) as we carried out in this manuscript using Trypanosoma brucei.

      Reviewer #3 (Public Review):

      Summary:

      The protein kinase, Aurora B, is a critical regulator of mitosis and cytokinesis in eukaryotes, exhibiting a dynamic localisation. As part of the Chromosomal Passenger Complex (CPC), along with the Aurora B activator, INCENP, and the CPC localisation module comprised of Borealin and Survivin, Aurora B travels from the kinetochores at metaphase to the spindle midzone at anaphase, which ensures its substrates are phosphorylated in a time- and space-dependent manner. In the kinetoplastid parasite, T. brucei, the Aurora B orthologue (AUK1), along with an INCENP orthologue known as CPC1, and a kinetoplastid-specific protein CPC2, also displays a dynamic localisation, moving from the kinetochores at metaphase to the spindle midzone at anaphase, to the anterior end of the newly synthesised flagellum attachment zone (FAZ) at cytokinesis. However, the trypanosome CPC lacks orthologues of Borealin and Survivin, and T. brucei kinetochores also have a unique composition, being comprised of dozens of kinetoplastid-specific proteins (KKTs). Of particular importance for this study are KKT7 and the KKT8 complex (comprising KKT8, KKT9, KKT11, and KKT12). Here, Ballmer and Akiyoshi seek to understand how the CPC assembles and is targeted to its different locations during the cell cycle in T. brucei.

      Strengths & Weaknesses:

      Using immunoprecipitation and mass-spectrometry approaches, Ballmer and Akiyoshi show that AUK1, CPC1, and CPC2 associate with two orphan kinesins, KIN-A and KIN-B, and with the use of endogenously expressed fluorescent fusion proteins, demonstrate for the first time that KIN-A and KIN-B display a dynamic localisation pattern similar to other components of the CPC. Most of these data provide convincing evidence for KIN-A and KIN-B being bona fide CPC proteins, although the evidence that KIN-A and KIN-B translocate to the anterior end of the new FAZ at cytokinesis is weak - the KIN-A/B signals are very faint and difficult to see, and cell outlines/brightfield images are not presented to allow the reader to determine the cellular location of these faint signals (Fig S1B).

      We thank the reviewer for their thorough assessment of our manuscript and the insightful feedback to further improve our study. To address the point above, we have acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      They then demonstrate, by using RNAi to deplete individual components, that the CPC proteins have hierarchical interdependencies for their localisation to the kinetochores at metaphase. These experiments appear to have been well performed, although only images of cell nuclei were shown (Fig 2A), meaning that the reader cannot properly assess whether CPC components have localised elsewhere in the cell, or if their abundance changes in response to depletion of another CPC protein.

      We chose to show close-ups of the nucleus to highlight the different localization patterns of CPC proteins under the different RNAi conditions. In none of these conditions did we observe mis-localization of CPC subunits to the cytoplasm. To clarify this point, we added the following sentence in the legend for Figure 2A:

      ‘A) Representative fluorescence micrographs showing the localization of YFP-tagged Aurora BAUK1, INCENPCPC1, KIN-A and KIN-B in 2K1N cells upon RNAi-mediated knockdown of indicated CPC subunits. Note that nuclear close-ups are shown here. CPC proteins were not detected in the cytoplasm. RNAi was induced with 1 μg/mL doxycycline for 24 h (KIN-B RNAi) or 16 h (all others). Cell lines: BAP3092, BAP2552, BAP2557, BAP3093, BAP2906, BAP2900, BAP2904, BAP3094, BAP2899, BAP2893, BAP2897, BAP3095, BAP3096, BAP2560, BAP2564, BAP3097. Scale bars, 2 μm.’

      Ballmer and Akiyoshi then go on to determine the kinetochore localisation domains of KIN-A and KIN-B. Using ectopically expressed GFP-tagged truncations, they show that coiled-coil domains within KIN-A and KIN-B, as well as a disordered C-terminal tail present only in KIN-A, but not the N-terminal motor domains of KIN-A or KIN-B, are required for kinetochore localisation. These data are strengthened by immunoprecipitating CPC complexes and crosslinking them prior to mass spectrometry analysis (IP-CLMS), a state-of-the-art approach, to determine the contacts between the CPC components. Structural predictions of the CPC structure are also made using AlphaFold2, suggesting that coiled coils form between KIN-A and KIN-B, and that KIN-A/B interact with the N termini of CPC1 and CPC2. Experimental results show that CPC1 and CPC2 are unable to localise to kinetochores if they lack their N-terminal domains consistent with these predictions. Altogether these data provide convincing evidence of the protein domains required for CPC kinetochore localisation and CPC protein interactions. However, the authors also conclude that KIN-B plays a minor role in localising the CPC to kinetochores compared to KIN-A. This conclusion is not particularly compelling as it stems from the observation that ectopically expressed GFP-NLS-KIN-A (full length or coiled-coil domain + tail) is also present at kinetochores during anaphase unlike endogenously expressed YFP-KIN-A. Not only is this localisation probably an artifact of the ectopic expression, but the KIN-B coiled-coil domain localises to kinetochores from S to metaphase and Fig S2G appears to show a portion of the expressed KIN-B coiled-coil domain colocalising with KKT2 at anaphase. It is unclear why KIN-B has been discounted here.

      As the reviewer points out, a small fraction of GFP-NLS-KIN-B317-624 is indeed detectable at kinetochores in anaphase, although most of the protein shows diffuse nuclear staining. There are various explanations for this phenomenon: It is conceivable that the KIN-B motor domain may contribute to microtubule binding and translocation of the CPC from kinetochores onto the spindle in anaphase. In our experiments, ectopically expressed KIN-B317-624 likely outcompetes a fraction of endogenous KIN-B for binding to KIN-A, which could interfere with this translocation process, leaving a population of CPC ‘stranded’ at kinetochores in anaphase. Another possibility, hinted at by the reviewer, is that the C-terminus of KIN-B interacts with receptors at the kinetochore/centromere. Although we do not discount this possibility, we nevertheless decided to focus on KIN-A in this study, because the anaphase kinetochore retention phenotype for both full-length GFP-NLS-KIN-A and -KIN-A309-862 is much stronger than for KIN-B317-624. Two additional reasons were that (i) KIN-A is highly conserved within kinetoplastids, whereas KIN-B orthologs are missing in some kinetoplastids, and (ii) no convincing interactions between KIN-B and kinetochore proteins were predicted by AF2.

      To address the reviewer’s point, we decided to include KIN-B in the title of this manuscript, which now reads: ‘Dynamic localization of the chromosomal passenger complex is controlled by the orphan kinesins KIN-A and KIN-B in the kinetoplastid parasite Trypanosoma brucei’.

      Moreover, we modified the corresponding paragraph in the results section as follows:

      ‘Intriguingly, unlike endogenously YFP-tagged KIN-A, ectopically expressed GFP fusions of both full-length KIN-A and KIN-A310-862 clearly localized at kinetochores even in anaphase (Figs. 2, F and H). Weak anaphase kinetochore signal was also detectable for KIN-B317-624 (Fig. S2F). GFP fusions of the central coiled-coil domain or the C-terminal disordered tail of KIN-A did not localize to kinetochores (data not shown). These results show that kinetochore localization of the CPC is mediated by KIN-A and KIN-B and requires both the central coiled-coil domain as well as the C-terminal disordered tail of KIN-A.’

      Next, using a mixture of RNAi depletion and LacI-LacO recruitment experiments, the authors show that kinetochore proteins KKT7 and KKT9 are required for AUK1 to localise to kinetochores (other KKT8 complex components were not tested here) and that all components of the KKT8 complex are required for KIN-A kinetochore localisation. Further, both KKT7 and KKT8 were able to recruit AUK1 to an ectopic locus in the S phase, and KKT7 recruited KKT8 complex proteins, which the authors suggest indicates it is upstream of KKT8. However, while these experiments have been performed well, the reciprocal experiment to show that KKT8 complex proteins cannot recruit KKT7, which could have confirmed this hierarchy, does not appear to have been performed. Further, since the LacI fusion proteins used in these experiments were ectopically expressed, they were retained (artificially) at kinetochores into anaphase; KKT8 and KIN-A were both able to recruit AUK1 to LacO foci in anaphase, while KKT7 was not. The authors conclude that this suggests the KKT8 complex is the main kinetochore receptor of the CPC - while very plausible, this conclusion is based on a likely artifact of ectopic expression, and for that reason, should be interpreted with a degree of caution.

      We previously showed that RNAi-mediated depletion of KKT7 disrupts kinetochore localization of KKT8 complex members, whereas kinetochore localization of KKT7 is unaffected by disruption of the KKT8 complex (Ishii and Akiyoshi, 2020). Moreover, in contrast to the KKT8 complex, KKT7 remains at kinetochores in anaphase (Akiyoshi and Gull, 2014). These data show that KKT7 is upstream of the KKT8 complex. In this context, the LacI-LacO tethering approach can be very useful to probe whether two proteins (or domains of proteins) could interact in vivo either directly or indirectly. However, a recruitment hierarchy cannot be inferred from such experiments because the data just shows whether X can recruit Y to an ectopic locus (but not whether X is upstream of Y or vice versa). Regarding the retention of Aurora BAUK1 at kinetochores in anaphase upon ectopic expression of GFP-KKT8-LacI, we agree with the reviewer that these data need to be carefully interpreted. Nevertheless, the notion that the KKT7-KKT8 complex recruits the CPC to kinetochores is also strongly supported by IP-MS, RNAi experiments, and AF2 predictions. For clarification and to address the reviewer’s point, we re-formulated the corresponding paragraph in the main text:

      ‘We previously showed that KKT7 lies upstream of the KKT8 complex (Ishii and Akiyoshi, 2020). Indeed, GFP-KKT72-261-LacI recruited tdTomato-KKT8, -KKT9 and -KKT12 (Fig. S4E). Expression of both GFP-KKT72-261-LacI and GFP-KKT8-LacI resulted in robust recruitment of tdTomato-Aurora BAUK1 to LacO foci in S phase (Figs. 4, E and F). Intriguingly, we also noticed that, unlike endogenous KKT8 (which is not present in anaphase), ectopically expressed GFP-KKT8-LacI remained at kinetochores during anaphase (Fig. 4F). This resulted in a fraction of tdTomato-Aurora BAUK1 being trapped at kinetochores during anaphase instead of migrating to the central spindle (Fig. 4F). We observed a comparable situation upon ectopic expression of GFP-KIN-A, which is retained on anaphase kinetochores together with tdTomato-KKT8 (Fig. S4F). In contrast, Aurora BAUK1 was not recruited to LacO foci marked by GFP- KKT72-261-LacI in anaphase (Fig. 4E).’

      Further IP-CLMS experiments, in combination with recombinant protein pull-down assays and structural predictions, suggested that within the KKT8 complex, there are two subcomplexes of KKT8:KKT12 and KKT9:KKT11, and that KKT7 interacts with KKT9:KKT11 to recruit the remainder of the KKT8 complex. The authors also assess the interdependencies between KKT8 complex components for localisation and expression, showing that all four subunits are required for the assembly of a stable KKT8 complex and present AlphaFold2 structural modelling data to support the two subcomplex models. In general, these data are of high quality and convincing with a few exceptions. The recombinant pulldown assay (Fig. 4H) is not particularly convincing as the 3rd eluate gel appears to show a band at the size of KKT11 (despite the labelling indicating no KKT11 was present in the input) but no pulldown of KKT9, which was present in the input according to the figure legend (although this may be mislabeled since not consistent with the text). The text also states that 6HIS-KKT8 was insoluble in the absence of KKT12, but this is not possible to assess from the data presented.

      We thank the reviewer for pointing out an error in the text: ‘Removal of both KKT9 and KKT11 did not impact formation of the KKT8:KKT12 subcomplex’ should read ‘Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex’. Regarding the very faint band perceived to be KKT11 in the 3rd eluate: This band runs slightly lower than KKT11 and likely represents a bacterial contaminant (which we have seen also in other preps in the past). We have made a note of this in the corresponding legend (new Fig. 4I). Moreover, we provide the estimated molecular weights for each subunit, as suggested by the reviewer in recommendation #14 (see below):

      ‘(I) Indicated combinations of 6HIS-tagged KKT8 (~46 kDa), KKT9 (~39 kDa), KKT11 (~29 kDa) and KKT12 (~23 kDa) were co-expressed in E. coli, followed by metal affinity chromatography and SDS-PAGE. The asterisk indicates a common contaminant.’

      The corresponding paragraph in the results section now reads:

      To validate these findings, we co-expressed combinations of 6HIS-KKT8, KKT9, KKT11 and KKT12 in E. coli and performed metal affinity chromatography (Fig. 4I). 6HIS-KKT8 efficiently pulled down KKT9, KKT11 and KKT12, as shown previously (Ishii and Akiyoshi, 2020). In the absence of KKT9, 6HIS-KKT8 still pulled down KKT11 and KKT12. Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex. In contrast, 6HIS-KKT8 could not be recovered without KKT12, indicating that KKT12 is required for formation of the full KKT8 complex. These results support the idea that the KKT8 complex consists of KKT8:KKT12 and KKT9:KKT11 subcomplexes.’

      It is also surprising that data showing the effects of KKT8, KKT9, and KKT12 depletion on KKT11 localisation and abundance are not presented alongside the reciprocal experiments in Fig S4G-J.

      YFP-KKT11 is delocalized upon depletion of KKT8 and KKT9 (see below). Unfortunately, we were unsuccessful in our attempts at deriving the corresponding KKT12 RNAi cell line, rendering this set of data incomplete. Because these data are not of critical importance for this study, we decided not to invest more time in attempting further transfections.

      Author response image 1.

      The authors also convincingly show that AlphaFold2 predictions of interactions between KKT9:KKT11 and a conserved domain (CD1) in the C-terminal tail of KIN-A are likely correct, with CD1 and a second conserved domain, CD2, identified through sequence analysis, acting synergistically to promote KIN-A kinetochore localisation at metaphase, but not being required for KIN-A to move to the central spindle at anaphase. They then hypothesise that the kinesin motor domain of KIN-A (but not KIN-B which is predicted to be inactive based on non-conservation of residues key for activity) determines its central spindle localisation at anaphase through binding to microtubules. In support of this hypothesis, the authors show that KIN-A, but not KIN-B can bind microtubules in vitro and in vivo. However, ectopically expressed GFP-NLS fusions of full-length KIN-A or KIN-A motor domain did not localise to the central spindle at anaphase. The authors suggest this is due to the GPF fusion disrupting the ATPase activity of the motor domain, but they provide no evidence that this is the case. Instead, they replace endogenous KIN-A with a predicted ATPase-defective mutant (G209A), showing that while this still localises to kinetochores, the kinetochores were frequently misaligned at metaphase, and that it no longer concentrates at the central spindle (with concomitant mis-localisation of AUK1), causing cells to accumulate at anaphase. From these data, the authors conclude that KIN-A ATPase activity is required for chromosome congression to the metaphase plate and its central spindle localisation at anaphase. While potentially very interesting, these data are incomplete in the absence of any experimental data to show that KIN-A possesses ATPase activity or that this activity is abrogated by the G209A mutation, and the conclusions of this section are rather speculative.

      Thank you for this important comment, which relates to a similar point raised by Reviewer 1 (see above). Indeed, ATPase and motor activity of KIN-A remain to demonstrated biochemically using recombinant proteins, which is beyond the scope of this study. We generated MSAs of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9, which are now presented in Figure 6A and S6A. These clearly show that key motifs required for ATP or tubulin binding in other kinesins are highly conserved in KIN-A (but not KIN-B). This includes the conserved glycine residue in the Switch II helix (G234 in human Kinesin-1, G210 in T. brucei KIN-A), which forms a hydrogen bond with the γ-phosphate of ATP, and upon mutation has been shown to impair ATPase activity and trap the motor head in a strong microtubule (‘rigor’) state (Rice et al., 1999; Sablin et al., 1996). The prominent rigor phenotype of KIN-AG210A is consistent with KIN-A having ATPase activity. In addition to the data in Fig. 6A and S6A, we made following changes to the main text:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).

      Ectopically expressed GFP-KIN-A and -KIN-A2-309 partially localized to the mitotic spindle but failed to concentrate at the midzone during anaphase (Figs. 2, F and G), suggesting that N-terminal tagging of the KIN-A motor domain may interfere with its function. To address whether the ATPase activity of KIN-A is required for central spindle localization of the CPC, we replaced one allele of KIN-A with a C-terminally YFP-tagged G210A ATP hydrolysis-defective rigor mutant (Fig. 6A) (Rice et al., 1999) and used an RNAi construct directed against the 3’UTR of KIN-A to deplete the untagged allele. The rigor mutation did not affect recruitment of KIN-A to kinetochores (Figs. S6, C and D). However, KIN-AG210A-YFP marked kinetochores were misaligned in ~50% of cells arrested in metaphase, suggesting that ATPase activity of KIN-A promotes chromosome congression to the metaphase plate (Figs. S6, E-H).’

      Impact:

      Overall, this work uses a wide range of cutting-edge molecular and structural predictive tools to provide a significant amount of new and detailed molecular data that shed light on the composition of the unusual trypanosome CPC and how it is assembled and targeted to different cellular locations during cell division. Given the fundamental nature of this research, it will be of interest to many parasitology researchers as well as cell biologists more generally, especially those working on aspects of mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the reviewer for his/her feedback and thoughtful and thorough assessment of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Why did the authors omit KIN-B from the title?

      We decided to add KIN-B in the title. Please see our response to Reviewer #3 (public review).

      (2) Abstract, line 28, "Furthermore, the kinesin motor activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset." This must be revised - see public review.

      We changed this section of the abstract as follows:

      ‘Furthermore, the ATPase activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset. Thus, KIN-A constitutes a unique ‘two-in-one’ CPC localization module in complex with KIN-B, which directs the CPC to kinetochores (from S phase until metaphase) via its C-terminal tail, and to the central spindle (in anaphase) via its N-terminal kinesin motor domain.’

      (3) Line 87-90. The findings by Li et al., 2008 (KIN-A and KIN-B interacting with Aurora B and epistasis analysis) should be introduced more comprehensively in the Introduction section.

      We added the following sentence in the introduction:

      ‘In addition, two orphan kinesins, KIN-A and KIN-B, have been proposed to transiently associate with Aurora BAUK1 during mitosis (Li et al., 2008; Li, 2012).’

      (4) Figure 1B. The way the Trypanosoma cell cycle is defined should be briefly explained in the main text, rather than just referring to the figure.

      The ‘KN’ annotation of the trypanosome cell cycle is explained in the Figure 1 legend. We now also added a brief description in the main text:

      ‘We next assessed the localization dynamics of fluorescently tagged KIN-A and KIN-B over the course of the cell cycle (Figs. 1, B-E). T. brucei possesses two DNA-containing organelles, the nucleus (‘N’) and the kinetoplast (‘K’). The kinetoplast is an organelle found uniquely in kinetoplastids, which contains the mitochondrial DNA and replicates and segregates prior to nuclear division. The ‘KN’ configuration serves as a good cell cycle marker (Woodward and Gull, 1990; Siegel et al., 2008).’

      (5) Line 118. Throughout the paper, it is not clear why GFP-NLS fusion was used instead of GFP fusion. Please justify the fusion of NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (6) Line 121, "Unexpectedly". It is not clear why this was unexpected.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      (7) Line 127-129. Defining homologs and orthologs is tricky - there are many homologs and paralogs of kinesin-like proteins. The method to define the presence or absence of KIN-A/KIN-B homologs should be described in the Materials and Methods section.

      Due to the difficulty in defining true orthologs for kinesin-like proteins, we took a conservative approach: reciprocal best BLAST hits. We first searched KIN-A homologs using BLAST in the TriTryp database or using hmmsearch using manually prepared hmm profiles. When the top hit in a given organism found T. brucei KIN-A in a reciprocal BLAST search in T. brucei proteome, we considered the hit as a true ortholog. We modified the Materials and Methods section as below.

      ‘Searches for homologous proteins were done using BLAST in the TriTryp database (Aslett et al., 2010) or using hmmsearch using manually prepared hmm profiles (HMMER version 3.0; Eddy, 1998). The top hit was considered as a true ortholog only if the reciprocal BLAST search returned the query protein in T. brucei.’

      (8) Line 156. For non-experts of Trypanosoma cell biology, it is not clear how the nucleolar localization is defined.

      The nucleolus in T. brucei is discernible as a DAPI-dim region in the nucleus.

      (9) Fig.2G and Fig.S2F. These data imply that the coiled-coil and C-terminal tail domains of KIN-A/KIN-B are important for anaphase spindle midzone enrichment. However, it is odd that this was not mentioned. This reviewer recommends that the authors quantify the midzone localization data of these constructs and discuss the role of the coiled-coil domains.

      One possibility is that KIN-A and KIN-B need to form a complex (via their coiled-coil domains) to localize to the spindle midzone. Another likely possibility, which is discussed in the manuscript, is that N-terminal tagging of KIN-A impairs motor activity. This is supported by the fact that the central spindle localization is also disrupted in full-length GFP-KIN-A. We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      (10) Line 288-289, "pLDDT scores improved significantly for KIN-A CD1 in complex with KKT9:KKT11 (>80) compared to KIN-A CD1 alone (~20) (Figs. S3, A and B)." I can see that pLDDT score is about 20 at KIN-A CD1 from Figs S3A, but the basis of pLDDT > 80 upon inclusion go KKT9:KKT11 is missing.

      We added the pLDDT and PAE plots for the AF2 prediction of KIN-A700-800 in complex with KKT9:KKT11 in Fig. S5B.

      (11) Fig. 5A. Since there is no supporting biochemical data for KIN-A-KKT9-KKT11 interaction, it is important to assess the stability of AlphaFold-based structural predictions of the KIN-A-KKT9-KKT11 interaction. Are there significant differences among the top 5 prediction results, and do these interactions remain stable after the "simulated annealing" process used in the AlphaFold predictions? Are predicted CD1-interacting regions/amino residues in KKT9 and KKT11 evolutionarily conserved?

      See above. The interaction was predicted in all 5 predictions as shown in Fig. S5B. Conservation of the CD1-interacting regions in KKT9 and KKT11 are shown below:

      Author response image 2.

      KKT9 (residues ~53 – 80 predicted to interact with KIN-A in T. brucei)

      Author response image 3.

      KKT11 (residues 61-85 predicted to interact with KIN-A in T. brucei)

      (12) Line 300, Fig. S5D and E, "failed to localize at kinetochores". From this resolution of the microscopy images, it is not clear if these proteins fail to localize at kinetochores as the KKT and KIN-A310-716 signals overlap. Perhaps, "failed to enrich at kinetochores" is a more appropriate statement.

      We changed this sentence according to the reviewer’s suggestion.

      (13) Line 309 and Fig 5D and F, "predominantly localized to the mitotic spindle". From this image shown in Fig 5D, it is not clear if KIN-A∆CD1-YFP and Aurora B are predominantly localized to the spindle or if they are still localized to centromeres that are misaligned on the spindle. Without microtubule staining, it is also not clear how microtubules are distributed in these cells. Please clarify how the presence or absence of kinetochore/spindle localization was defined.

      As shown in Fig. S5E and S5F, deletion of CD1 clearly impairs kinetochore localization of KIN-A (kinetochores marked by tdTomato-KKT2). Moreover, misalignment of kinetochores, as observed upon expression of the KIN-AG210A rigor mutant, would result in an increase in 2K1N cells and proliferation defects, which is not the case for the KIN-A∆CD1 mutant (Fig. 5H, Fig. S5I). KIN-A∆CD1-YFP appears to localize diffusely along the entire length of the mitotic spindle, whereas we still observe kinetochore-like foci in the rigor mutant. Unfortunately, we do not have suitable antibodies that would allow us to distinguish spindle microtubules from the vast subpellicular microtubule array present in T. brucei and hence need to rely on tagging spindle-associated proteins such as MAP103.

      (14) Fig. 5F, G, S5F. Along the same lines, it would be helpful to show example images for each category - "kinetochores", "kinetochores + spindle", and "spindle".

      As suggested by the reviewer, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      (15) Line 332 and Fig. S6A. The experiment may be repeated in the presence of ATP or nonhydrolyzable ATP analogs.

      We thank the reviewer for the suggestion. We envisage such experiments for an in-depth follow-up study.

      (16) Line 342, "motor activity of KIN-A". Until KIN-A is shown to have motor activity, the result based on the rigor mutant does not show that the motor activity of KIN-A promotes chromosome congression. The result suggests that the ATPase activity of KIN-A is important.

      We changed that sentence as suggested by the reviewer.

      (17) Line 419 -. The authors base their discussion on the speculation that KIN-A is a plus-end directed motor. Please justify this speculation.

      Indeed, the notion that KIN-A is a plus-end directed motor remains a hypothesis, which is based on sequence alignments with other plus-end directed motors and the observation that the KIN-A motor domain is involved in translocation of the CPC to the central spindle in anaphase. We have modified the corresponding section in the discussion as follows:

      ‘It remains to be investigated whether KIN-A truly functions as a plus-end directed motor. The role of the KIN-B in this context is equally unclear. Since KIN-B does not possess a functional kinesin motor domain, we deem it unlikely that the KIN-A:KIN-B heterodimer moves hand-over-hand along microtubules as do conventional (kinesin-1 family) kinesins. Rather, the KIN-A motor domain may function as a single-headed unit and drive processive plus-end directed motion using a mechanism similar to the kinesin-3 family kinesin KIF1A (Okada and Hirokawa, 1999).’

      (18) Line 422-423, "plus-end directed motion using a mechanism similar to kinesin-3 family kinesins (such as KIF1A)." Please cite a reference supporting this statement.

      See above. We cited a paper by (Okada and Hirokawa, 1999).

      Reviewer #2 (Recommendations For The Authors):

      Please provide a quantification of data shown in Figure 2F-H and described in lines 151-166.

      We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      It appears as if the paper more or less follows a chronological order of the experiments that were performed before AF multimer enabled the insightful and compelling structural analysis. That is a matter of style, but in some cases, the writing could be updated, shortened, or re-arranged into a more logical order. Concrete examples:

      (i) Line 144: "we did not include CPC2 for further analysis in this study" Although CPC2 features at a prominent and interesting position in the predicted structures of the kinetoplastid CPC, shown in later main figures.

      We attempted RNAi-mediated depletion of CPC2 using two different shRNA constructs. However, we cannot exclude the possibility that the knockdown of CPC2 was less efficient compared with the other CPC subunits. For this reason, we decided to remove all the data on CPC2 from Fig. S2.

      (ii) The work with the KIN-A motor domain only and KIN-A ∆motor domain (Fig 2) begs the question about a more subtle mutation to interfere with the motor domain. Which is ultimately presented in Fig 6. I think that the final paragraph and Figure 6 follow naturally after Figure 2.

      We appreciate the suggestion. However, we would like to keep Figure 6 there.

      (iii) The high-confidence structural predictions in Fig 3 and Fig 4 are insightful. The XL-MS descriptions that precede them are not so helpful (Fig 3A and 4G and in the text). To emphasize their status as experimental support for the predicted structures, which is very important, it would be good to discuss the XL-MS after presenting the models.

      As suggested, we have re-arranged the text and/or figures such that the AF2 predictions are discussed first and the CLMS data are brought in afterwards.

      Figure 1A prominently features an arbitrary color code and a lot of protein IDs without a legend. That is not a very convincing start. Figure S1 is more informative, containing annotated protein names and results of the KIN-A and KIN-B IPs. Please improve Figure 1A, for example by presenting a modified version of Figure S1. In all these types of figures, please list both protein names and gene IDs.

      We agree with the reviewer that the IP-MS data in Fig. S1 is more informative and hence decided to swap the heatmaps in Fig. 1A and Fig. S1A. We further annotated the heatmap corresponding to the Aurora BAUK1 IP-MS (now presented in Fig. S1) as suggested by the reviewer.

      The visualization of the structural predictions is not consistent among figures:

      (i) The structure in Fig 4I is important and could be displayed larger. The pLDDT scores, and especially those of the non-displayed models, do not add much information and should not be a main panel. If the authors want to display the pLDDT scores, I recommend a panel (main or supplement) of the structure colored for local prediction confidences, as in Fig 5A.

      (ii) In Figure 5A itself, it is hard to follow the chains in general, and KIN-A in particular, since the structure is pLDDT-coloured. Please present an additional panel colored by chain (consistent with Fig 4I, as mentioned above).

      (iii) The summarizing diagram, currently displayed as Fig 4J, should be placed after Fig 5A and take the discovered KIN-A - KKT9-11 connection into account. Ideally, it also covers the suspected importance of the motor domain and serves as a summarising diagram.

      We thank the reviewer for the constructive comments. For each structure prediction, we now present two images side by side; one coloured by chain and one colored by pLDDT. We recently re-ran AF2 for the full CPC and also for the KKT7N-KKT8 complex, and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction. We also increased the size of the structures shown in Fig. 4. Furthermore, we decided to remove the summarizing diagram from Fig. 4 and instead made a new main Fig. 7, which shows a more detailed schematic, which also takes into account the proposed function of the KIN-A motor domain, as suggested by the reviewer, and other points addressed in the Discussion.

      The methods section for the structural predictions lacks essential information. Predictions can only be reproduced if the version of AF2 multimer v2.x is specified and key parameters are mentioned.

      As suggested, we have added the details in the Materials and Methods section as follows.

      ‘Structural predictions of KIN-A/KIN-B, KIN-A310-862/KIN-B317-624, CPC1/CPC2/KIN-A300-599/KIN-B 317-624, and KIN-A700-800/KKT9/KKT11 were performed using ColabFold version 1.3.0 (AlphaFold-Multimer version 2), while those of AUK1/CPC1/CPC2/KIN-A1-599/KIN-B, KKT71-261/KKT9/KKT11/KKT8/KKT12, KKT9/KKT11/KKT8/KKT12, and KKT71-261/KKT9/KKT11 were performed using ColabFold version 1.5.3 (AlphaFold-Multimer version 2.3.1) using default settings, accessed via https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.3.0/AlphaFold2.ipynb and https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.3/AlphaFold2.ipynb.’

      Line 121, please explain the "Unexpectedly" by including a reference to the work from Li and colleagues. A statement with some details would be useful, as the difference between both studies appears to be crucial for the novelty of this paper. Alternatively, refer to this being covered in the discussion.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      Line 285 refers to "conserved" regions in the C-terminal part of KIN-A, referring to Figure 5. Please expand the MSA in Figure 5B to get an idea about the conservation/variation outside CD1 and CD2.

      We now present the full MSA for KIN-A proteins in kinetoplastids in Fig. S5A.

      Please specify what is meant by Line 367-369 for someone who is not familiar with the work by Komaki et al. 2022. Either clarify in the text or clarify in the text with data to support it.

      We updated the corresponding section in the discussion as follows:

      ‘Komaki et al. recently identified two functionally redundant CPC proteins in Arabidopsis, Borealin Related Interactor 1 and 2 (BORI1 and 2), which engage in a triple helix bundle with INCENP and Borealin using a conserved helical domain but employ an FHA domain instead of a BIR domain to read H3T3ph (Komaki et al., 2022).’

      Data presented in Figure S6A, the microtubule co-sedimentation assay, is not convincing since a substantial amount of KIN-A/B is pelleted in the absence of microtubules. Did the authors spin the proteins in BRB80 before the assay to continue with soluble material and reduce sedimentation in the absence of microtubules? If the authors want to keep the wording in lines 331-332, the MT-binding properties of KIN-A and KIN-B need to be investigated in more detail, for example with a titration and a quantification thereof. Otherwise, they should change the text and replace "confirms" with "is consistent with". In any case, the legend needs to be expanded to include more information.

      To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      We have also updated the main text in the results section:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      Details:

      The readability of the pAE plots could be improved by arranging sequences according to their position in the structure. For example in Fig4I, KKT8 could precede KKT12. If it is easy to update this, the authors might want to do so.

      We re-ran the AF2 predictions for the KKT7N – KKT8 complex in Fig. 4/S4 and changed the order according to the reviewer’s suggestion (KKT9:KKT11:KKT8:KKT12).

      The same paper is referred to as Je Van Hooff et al. 2017 and as Van Hooff et al. 2017

      Thank you for pointing this out. We have corrected the citation.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please state at the end of the introduction/start of the results section that this work was performed in procyclic trypanosomes. Given that the cell cycles of procyclic and bloodstream forms differ, this is important.

      We added this information at the end of the introduction:

      ‘Here, by combining biochemical, structural and cell biological approaches in procyclic form T. brucei, we show that the trypanosome CPC is a pentameric complex comprising Aurora BAUK1, INCENPCPC1, CPC2 and the two orphan kinesins KIN-A and KIN-B.’

      (2) Please define NLS at first use (line 118), and for clarity, explain the rationale for using GFP with an NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (3) Lines 148-150 - it would strengthen this claim if KIN-A/B protein levels were assessed by Western blot.

      We now present a Western blot in Fig. S2C, showing that bulk KIN-B levels are clearly reduced upon KIN-A RNAi. The same is true also to some extent for KIN-A levels upon KIN-B RNAi, although this is less obvious, possibly due to the lower efficiency of KIN-B compared to KIN-A RNAi as judged by fluorescence microscopy (quantified in Fig. 2D and 2E).

      (4) Line 253 - the text mentions the removal of both KKT9 and KKT11, which is not consistent with the figure (Fig 4H) - do you mean the removal of either KKT9 or KKT11?

      Yes, we thank the reviewer for pointing out this mistake in the text, which has now been corrected.

      (5) Line 337 - please include a reference for the G209A ATPase-defective rigor mutant - has this been shown to result in KIN-A being inactive previously?

      Please see above our answer in public review.

      (6) It is not always obvious when fluorescent fusion proteins are being expressed endogenously or ectopically, or when they are being expressed in an RNAi background or not without tracing the cell lines in Table S1 - please ensure this is clearly stated throughout the manuscript.

      We now made sure that this is clearly stated in the main text as well as in the figure legends.

      (7) Line 410 - 'KIN-A C-terminal tail is stuffed full of conserved CDK1CRK3 sites' - what does 'stuffed full' really mean (this is rather imprecise) and what are the consensus sites - are these CDK1 consensus sites that are assumed to be conserved for CRK3? I'm not aware of consensus sites for CRK3 having been determined, but if they have, this should be referenced.

      We have modified the corresponding section in the discussion as follows:

      ‘In support of this, the KIN-A C-terminal tail harbours many putative CRK3 sites (10 sites matching the minimal S/T-P consensus motif for CDKs) and is also heavily phosphorylated by Aurora BAUK1 in vitro (Ballmer et al. 2024). Finally, we speculate that the interaction of KIN-A motor domain with microtubules, coupled to the force generating ATP hydrolysis and possibly plus-end directed motion, eventually outcompetes the weakened interactions of the CPC with the kinetochore and facilitates the extraction of the CPC from chromosomes onto spindle microtubules during anaphase. Indeed, deletion of the KIN-A motor domain or impairment of its motor function through N-terminal GFP tagging causes the CPC to be trapped at kinetochores in anaphase. Central spindle localization is additionally dependent on the ATPase activity of the KIN-A motor domain as illustrated by the KIN-A rigor mutant.’

      (8) Lines 412-416: this proposal is written rather definitively - given no motor activity has been demonstrated for KIN-A, please make clear that this is still just a theory.

      See above.

      (9) Fig 1: KKT2 is not highlighted in Fig 1A - given this has been used for colocalization in Fig 1C-E, was it recovered, and if not, why not? Fig 1B-E: the S phase/1K1N terminology is somewhat misleading. Not all S phase cells will have elongated kinetoplasts - usually an asterisk is used to signify replicated DNA, not kinetoplast shape. If it is to be used here for elongation, then for consistency, N should be used for G2/mitotic cells.

      Fig. 1A (now Fig. S1A) only shows the tip 30 hits. KKT2 was indeed recovered with Aurora BAUK1 (see Table S2) and is often used as a kinetochore marker in trypanosomes by our lab and others since the signal of fluorescently tagged KKT2 is relatively bright and KKT2 localizes to centromeres throughout the cell cycle.

      (10) A general comment for all image figures is that these do not have accompanying brightfield images and it is therefore difficult to know where the cell body is, or sometimes which nuclei and kinetoplasts belong to which cell where DNA from more than one cell is within the image. It would be beneficial if brightfield images could be added, or alternatively, the cell outlines were traced onto DAPI or merged images. Also, brightfield images would allow the stage of cytokinesis (pre-furrowing/furrowing/abscission) in anaphase cells to be determined.

      Since this study primarily addresses the recruitment mechanism of the CPC to kinetochores and to the central spindle from S phase to metaphase and in anaphase, respectively, and CPC proteins are not observed outside of the nucleus during these cell cycle stages, we did not present brightfield images in the figures. However, this point is particularly valid for discerning the localization of KIN-A and KIN-B to the new FAZ tip from late anaphase onwards. Hence, we acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      (11) Fig 2A: legend should state that the micrographs show the localisation of the proteins within the nucleus as whole cells are not shown. 2C: can INCENP not be split into 2 lines - the 'IN' looks like 1N at first glance, which is confusing.

      We have applied the suggested change in Fig. 2.

      (12) Fig 3 (and other AF2 figures): Could the lines for satisfied & not satisfied in the key be thicker so they more closely resemble the lines in the figure and are less likely to be confused with the disordered regions of the CPC components?

      We have now made those lines thicker.

      (13) Why were different E value thresholds used in Fig 3 and Fig 4?

      The CLMS data in Fig. 3 and Fig. 4 now both use the same E value threshold of E-3 (previously E-4 was used in Fig. 4). To determine a sensible significance threshold, we included some yeast protein sequences (‘false positives’) in the database used in pLink2 for identification of crosslinked peptides. Note that we recently also re-ran AF2 for the full CPC and for the KKT7N-KKT8 complex and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction.

      (14) Fig 4H legend - please give the expected sizes of these recombinant proteins & check the 3rd elution panel (see public review comments).

      See above response in public review.

      (15) Fig 4I - please explain what the colours of the PAE plot and the values in the key signify, as well as how the Scored Residue values are arrived at. Please also define the pIDDT in the legend.

      We have cited DeepMind’s 2021 methods paper, in which the outputs of AlphaFold are explained in detail. We also added a short description of the pLDDT and PAE scores and the corresponding colour coding in the legends of Fig. 3 and Fig. 4, respectively.

      From figure 3 legend:

      ‘(B) Cartoon representation showing two orientations of the trypanosome CPC, coloured by protein on the left (Aurora BAUK1: crimson, INCENPCPC1: green, CPC2: cyan, KIN-A: magenta, and KIN-B: yellow) or according to their pLDDT values on the right, assembled from AlphaFold2 predictions shown in Figure S3. The pLDDT score is a per-residue estimate of the confidence in the AlphaFold prediction on a scale from 0 – 100. pLDDT > 70 (blue, cyan) indicates a reasonable accuracy of the model, while pLDDT < 50 (red) indicates a low accuracy and often reflects disordered regions of the protein (Jumper et al., 2021). BS3 crosslinks in (B) were mapped onto the model using PyXlinkViewer (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å) (Schiffrin et al., 2020).’

      From Figure 4 legend:

      ‘(G) AlphaFold2 model of the KKT7 – KKT8 complex, coloured by protein (KKT71-261: green, KKT8: blue, KKT12: pink, KKT9: cyan and KKT11: orange) (left) and by pLDDT (center). BS3 crosslinks in (H) were mapped onto the model using PyXlinkViewer (Schiffrin et al., 2020) (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å). Right: Predicted Aligned Error (PAE) plot of model shown on the left (rank_2). The colour indicates AlphaFold’s expected position error (blue = low, red = high) at the residue on the x axis if the predicted and true structures were aligned on the residue on the y axis (Jumper et al., 2021).’

      (16) Fig 6 legend - Line 730 should say (F) not (C).

      Thank you for pointing out this typo.

      (17) Fig S1A - a key is missing for the colours. Fig S1B/C - cell outlines or a brightfield image are really needed here - see earlier comment. Fig S1D - there doesn't seem to be a method for how this tree was generated.

      See above response in public review regarding Fig. S1A and S1B/C. The tree in Fig. S1D is based on (Butenko et al., 2020).

      (18) Fig S2: A: how was protein knockdown validated (especially for CPC2 where there was little obvious phenotype)? Fig S2B: the y-axis should read proportion of cells, not percentage. Fig S2E - NLS should be labelled.

      Thank you for pointing out the mistake in the labelling.

      (19) Fig S3: PAE plots should be labelled with protein names, not A-E. Similarly, the pIDDT plots should be labelled as in Fig 4I.

      We have corrected the labelling in Fig. S3.

      (20) Fig S5A-D - cell cycle stage labels are missing from images.

      Thank you for pointing out the missing cell cycle stage labels.

      Addition by editor:

      In line 126 the statement that KIN-A and KIN-B "associate with Aurora-AUK1, INCENP-CPC1 and CPC2 throughout the cell cycle" seems too strong. There is no direct evidence for this. Please re-phrase as "likely associate" or "suggest... that ... may...".

      We have modified that sentence according to the editor’s suggestion.

      References:

      Akiyoshi, B., and K. Gull. 2014. Discovery of Unconventional Kinetochores in Kinetoplastids. Cell. 156. doi:10.1016/j.cell.2014.01.049.

      Butenko, A., F.R. Opperdoes, O. Flegontova, A. Horák, V. Hampl, P. Keeling, R.M.R. Gawryluk, D. Tikhonenkov, P. Flegontov, and J. Lukeš. 2020. Evolution of metabolic capabilities and molecular features of diplonemids, kinetoplastids, and euglenids. BMC Biology 2020 18:1. 18:1–28. doi:10.1186/S12915-020-0754-1.

      Cormier, A., D.G. Drubin, and G. Barnes. 2013. Phosphorylation regulates kinase and microtubule binding activities of the budding yeast chromosomal passenger complex in vitro. J Biol Chem. 288:23203–23211. doi:10.1074/JBC.M113.491480. Endow, S.A., F.J. Kull, and H. Liu. 2010. Kinesins at a glance. J Cell Sci. 123:3420. doi:10.1242/JCS.064113.

      Fink, S., K. Turnbull, A. Desai, and C.S. Campbell. 2017. An engineered minimal chromosomal passenger complex reveals a role for INCENP/Sli15 spindle association in chromosome biorientation. J Cell Biol. 216:911–923. doi:10.1083/JCB.201609123.

      van der Horst, A., M.J.M. Vromans, K. Bouwman, M.S. van der Waal, M.A. Hadders, and S.M.A. Lens. 2015. Inter-domain Cooperation in INCENP Promotes Aurora B Relocation from Centromeres to Microtubules. Cell Rep. 12:380–387. doi:10.1016/J.CELREP.2015.06.038.

      Ishii, M., and B. Akiyoshi. 2020. Characterization of unconventional kinetochore kinases KKT10/19 in Trypanosoma brucei. J Cell Sci. doi:10.1242/jcs.240978.

      Jeyaprakash, A.A., C. Basquin, U. Jayachandran, and E. Conti. 2011. Structural Basis for the Recognition of Phosphorylated Histone H3 by the Survivin Subunit of the Chromosomal Passenger Complex. Structure. 19:1625–1634. doi:10.1016/J.STR.2011.09.002.

      Jeyaprakash, A.A., U.R. Klein, D. Lindner, J. Ebert, E.A. Nigg, and E. Conti. 2007. Structure of a Survivin–Borealin–INCENP Core Complex Reveals How Chromosomal Passengers Travel Together. Cell. 131. doi:10.1016/j.cell.2007.07.045.

      Jumper, J., R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S.A.A. Kohl, A.J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A.W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 2021 596:7873. 596:583–589. doi:10.1038/s41586-021-03819-2.

      Kang, J.S., I.M. Cheeseman, G. Kallstrom, S. Velmurugan, G. Barnes, and C.S.M. Chan. 2001. Functional cooperation of Dam1, Ipl1, and the inner centromere protein (INCENP)-related protein Sli15 during chromosome segregation. J Cell Biol. 155:763–774. doi:10.1083/JCB.200105029.

      Klein, U.R., E.A. Nigg, and U. Gruneberg. 2006. Centromere targeting of the chromosomal passenger complex requires a ternary subcomplex of Borealin, Survivin, and the N-terminal domain of INCENP. Mol Biol Cell. 17:2547–2558. doi:10.1091/MBC.E05-12-1133.

      Komaki, S., E.C. Tromer, G. De Jaeger, N. De Winne, M. Heese, and A. Schnittger. 2022. Molecular convergence by differential domain acquisition is a hallmark of chromosomal passenger complex evolution. Proc Natl Acad Sci U S A. 119. doi:10.1073/PNAS.2200108119/-/DCSUPPLEMENTAL.

      Li, Z. 2012. Regulation of the Cell Division Cycle in Trypanosoma brucei. Eukaryot Cell. 11:1180. doi:10.1128/EC.00145-12.

      Li, Z., J.H. Lee, F. Chu, A.L. Burlingame, A. Günzl, and C.C. Wang. 2008. Identification of a Novel Chromosomal Passenger Complex and Its Unique Localization during Cytokinesis in Trypanosoma brucei. PLoS One. 3. doi:10.1371/journal.pone.0002354.

      Mackay, A.M., D.M. Eckley, C. Chue, and W.C. Earnshaw. 1993. Molecular analysis of the INCENPs (inner centromere proteins): separate domains are required for association with microtubules during interphase and with the central spindle during anaphase. J Cell Biol. 123:373–385. doi:10.1083/JCB.123.2.373.

      Marchetti, M.A., C. Tschudi, H. Kwon, S.L. Wolin, and E. Ullu. 2000. Import of proteins into the trypanosome nucleus and their distribution at karyokinesis. J Cell Sci. 113 ( Pt 5):899–906. doi:10.1242/JCS.113.5.899.

      Nakajima, Y., A. Cormier, R.G. Tyers, A. Pigula, Y. Peng, D.G. Drubin, and G. Barnes. 2011. Ipl1/Aurora-dependent phosphorylation of Sli15/INCENP regulates CPC-spindle interaction to ensure proper microtubule dynamics. J Cell Biol. 194:137–153. doi:10.1083/JCB.201009137.

      Noujaim, M., S. Bechstedt, M. Wieczorek, and G.J. Brouhard. 2014. Microtubules accelerate the kinase activity of Aurora-B by a reduction in dimensionality. PLoS One. 9. doi:10.1371/JOURNAL.PONE.0086786.

      Okada, Y., and N. Hirokawa. 1999. A processive single-headed motor: Kinesin superfamily protein KIF1A. Science (1979). 283:1152–1157. doi:10.1126/SCIENCE.283.5405.1152.

      Rice, S., A.W. Lin, D. Safer, C.L. Hart, N. Naber, B.O. Carragher, S.M. Cain, E. Pechatnikova, E.M. Wilson-Kubalek, M. Whittaker, E. Pate, R. Cooke, E.W. Taylor, R.A. Milligan, and R.D. Vale. 1999. A structural change in the kinesin motor protein that drives motility. Nature 1999 402:6763. 402:778–784. doi:10.1038/45483.

      Sablin, E.P., F.J. Kull, R. Cooke, R.D. Vale, and R.J. Fletterick. 1996. Crystal structure of the motor domain of the kinesin-related motor ncd. Nature 1996 380:6574. 380:555–559. doi:10.1038/380555a0.

      Samejima, K., M. Platani, M. Wolny, H. Ogawa, G. Vargiu, P.J. Knight, M. Peckham, and W.C. Earnshaw. 2015. The Inner Centromere Protein (INCENP) Coil Is a Single α-Helix (SAH) Domain That Binds Directly to Microtubules and Is Important for Chromosome Passenger Complex (CPC) Localization and Function in Mitosis. J Biol Chem. 290:21460–21472. doi:10.1074/JBC.M115.645317.

      Schiffrin, B., S.E. Radford, D.J. Brockwell, and A.N. Calabrese. 2020. PyXlinkViewer: A flexible tool for visualization of protein chemical crosslinking data within the PyMOL molecular graphics system. Protein Sci. 29:1851–1857. doi:10.1002/PRO.3902.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study advances our understanding of the cell specific treatment of cone photoreceptor degeneration by Txnip. The evidence supporting the conclusions is convincing with rigorous genetic manipulation of Txnip mutations, however, there are a few areas in which the article may be improved through further analysis and application of the data. The work will be of broad interest to vision researchers, cell biologists and biochemists.

      Reviewer #1 (Public Review):

      Summary:

      This is a follow-up study to the authors' previous eLife report about the roles of an alpha-arrestin called protein thioredoxin interacting protein (Txnip) in cone photoreceptors and in the retinal pigment epithelium. The findings are important because they provide new information about the mechanism of glucose and lactate transport to cone photoreceptors and because they may become the basis for therapies for retinal degenerative diseases.

      Strengths:

      Overall, the study is carefully done and, although the analysis is fairly comprehensive with many different versions of the protein analyzed, it is clearly enough described to follow. Figure 4 greatly facilitated my ability to follow, understand and interpret the study.

      Weaknesses:

      I have just one concern that I would like the authors to address. It is about the text that begins at line 133: "We assayed their ability to clear GLUT1 from the RPE surface (Figure 2A)". Please provide more details about this. From the figure it appears that n = 1 for this experiment, but given how careful the authors are with these types of studies that seems unlikely. How did the authors quantify the ability to clear GLUT1 from the surface? Was it cleared from both the apical and basal surface? (It is hard to resolve the apical and basal surfaces in the images provided). The experiments shown in Fig. 1H and Fig. 1I of PMID 31365873 shows how GLUT1 disappears only from the apical surface (under the conditions of that experiment and through the mechanism described in their text). It would be helpful for the authors to discuss their current results in the context of that experiment.

      We repeated all eight AAV-Best1-Txnip alleles for RPE GLUT1 staining with more than three eyes of each condition. We also quantified the GLUT1 intensity on the RPE basal surface. A new Figure 2-figure supplement 1 with these data has been added to this submission. The results and conclusions are similar to those in our initial submission.

      As mentioned in our provisional responses: GLUT1 on the basal surface of the RPE is more easily scored than that on the apical surface. The photoreceptor inner segments and Müller glia microvilli also have GLUT1, and their processes are juxtaposed and/or intertwined with the apical processes of the RPE, making the apical process GLUT1 staining of the RPE much more difficult to score. In some sections where the RPE and the retina separate, we can score the apical process GLUT1 staining of the RPE, but we do not always have this situation in our sections. The current quantification in the new Figure 2-figure supplement 1 thus concerns only the basal staining.

      As a separate issue, Reviewer #1 mentioned the work of another group (Wang et al., 2019, PMID: 31365873), which claimed that, on the apical surface of the RPE, GLUT1 is down-regulated in a RP mouse strain, RhoP23H. We have not consistently observed such a down-regulation of GLUT1 in other RP mouse strains such as rd1, rd10 or Rho-/- (unpublished data; see review Xue and Cepko, 2023, PMID: 37460158). However, as we pointed out above, it is difficult to score GLUT1 staining on the RPE apical surface. It is even more difficult in the degenerating retina where RPE and photoreceptor processes degenerate. For reference, one can see images of degenerating RPE apical processes in Wu et al. 2021 (PMID: 33491671).

      Reviewer #2 (Public Review):

      The hard work of the authors is much appreciated. With overexpression of a-arrestin Txnip in RPE, cones and the combined respectively, the authors show a potential gene agnostic treatment that can be applied to retinitis pigmentosa. Furthermore, since Txnip is related to multiple intracellular signaling pathway, this study is of value for research in the mechanism of secondary cone dystrophy as well.

      There are a few areas in which the article may be improved through further analysis and application of the data, as well as some adjustments that should be made in to clarify specific points in the article.

      Reviewer #3 (Public Review):

      Summary:

      Xue et al. extended their groundbreaking discovery demonstrating the protective effect of Txnip on cone photoreceptor survival. This was achieved by investigating the protection of cone degeneration through the overexpression of five distinct mutated variants of Txnip within the retinal pigment epithelium (RPE). Moreover, the study explored the roles of two proteins, HSP90AB1 and Arrdc4, which share similarities or associations with Txnip. They found the protection of Txnip in RPE cells and its mechanism is different from its protection in cone cells. These discoveries have significant implications for advancing our understanding of the mechanisms underlying Txnip's protection on cone cells.

      Strengths: (1) Identify the roles of different Txnip mutations in RPE and their effects on the expression of glucose transporter

      (2) Dissect the mechanism of Txnip in RPE vs Cone photoreceptors in retinal degeneration models.

      (3) Explore the functions of ARrdc4, a protein similar to Txnip and HSP90AB1 in cone degeneration.

      Weaknesses:

      (1) Arrdc4 has deleterious effect on cone survival but no discussion on its mechanism.

      (2) Inhibition of HSP90 is known to cause retinal generation. It is unclear why inhibition enhances the protection of Txnip.

      As mentioned in our provisional responses, little was known about the function of Arrdc4 or HSP90AB1 in cones. We summarize some of the recent discoveries regarding these two proteins in the new Discussion:

      “Arrdc4, the most similar α-arrestin protein to Txnip that also has Arrestin N- and C- domains, accelerated RP cone death when transduced via AAV (Figure 1). This observation suggests that Txnip has unique functions that protect RP cones. Recently, Arrdc4 has been proposed to be critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al. 2023). The implication of this potential role in RP cone survival is unclear, but interestingly, the activation of the insulin/mTORC1 pathway is beneficial to RP cone survival (Punzo et al. 2009; Venkatesh et al. 2015).”

      “Little is known about the function of HSP90AB1. Knocking down Hsp90ab1 improved mitochondrial metabolism of skeletal muscle in a diabetic mouse model (Jing et al. 2018). Knocking out HSP90AA1, a paralog of HSP90AB1 which has 14% different amino acids, led to rod death and correlated with PDE6 dysregulation (Munezero et al. 2023). Inhibiting HSP90AA1 with small molecules transiently delayed cone death in human retinal organoids under low glucose conditions (Spirig et al. 2023). However, the exact role of HSP90AA1 in photoreceptors needs to be clarified, and the implications for HSP90AB1 in RP cones are still unclear. ”

      In addition, we used AlphaFold Multimer, an AI algorithm based on AlphaFold-2, to explore the possible interaction between TXNIP, PARP1 and HSP90AB1 in the revision. One of the predicted models is shown as the new Figure 5-figure supplement 2. The C-terminus of Txnip is predicted to link HSP90AB1 and PARP1 together in this model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have just one concern that I would like the authors to address. It is about the text that begins at line 133: "We assayed their ability to clear GLUT1 from the RPE surface (Figure 2A)". Please provide more details about this. From the figure it appears that n = 1 for this experiment, but given how careful the authors are with these types of studies that seems unlikely. How did the authors quantify the ability to clear GLUT1 from the surface? Was it cleared from both the apical and basal surface? (It is hard to resolve the apical and basal surfaces in the images provided). The experiments shown in Fig. 1H and Fig. 1I of PMID 31365873 shows how GLUT1 disappears only from the apical surface (under the conditions of that experiment and through the mechanism described in their text). It would be helpful for the authors to discuss their current results in the context of that experiment.

      See our responses to Review #1’s public review section above.

      Also, is the clearance from the RPE plasma membrane homogenous throughout the RPE monolayer?

      In the area of AAV infection, the effects are very homogenous. In the uninfected area, the clearance does not occur, and we consider the uninfected area of the same eye to be an excellent internal control.

      A statistical analysis (as was provided for other experiments in the manuscript) would help to make the surprising conclusion about C.Txhniip.C247S more convincing.

      In this revision, we used the Mann-Whitney U test with the Bonferroni correction for GLUT1 intensity quantification. For the cone survival statistics, we used the t-test or ANOVA with Dunnett multiple comparison test. The information has been added to each figure legend.

      Another improvement I suggest for this figure is to include normal full length Txnip as a positive control to show how completely it removes GLUT1 from the surface.

      Added. See the new Figure 2-figure supplement 1.

      Another point that should be discussed is - when Txnip prevents GLUT1 from reaching the surface does all the GLUT1 get fully degraded within the cell. A brief description of how Txnip influences GLUT1 stability and localization would be helpful.

      We are unable to track the fate of the GLUT1 after it is removed, i.e. we do not see definitive intracellular staining. We do not know if this is due to degradation or a hidden epitope.

      Minor point

      (1) Confusing citation on lines 99-100: "We previously showed that overexpressing the Txnip wt allele in the RPE using an RPE specific promoter, derived from the Best1 gene (Esumi et al. 2009),.." makes it sound like Esumi et al. is the citation for their previous study, which is not correct.

      We have amended this to: "We previously showed (Xue et al. 2021) that overexpressing the Txnip wt allele in the RPE using an RPE-specific promoter, derived from the Best1 gene (Esumi et al., 2009), did not improve RP cone survival."

      Reviewer #2 (Recommendations For The Authors):

      Regarding the manuscript, here are some suggestions that authors can take into consideration for the completeness of the study:

      (1) The text references the relationship between α-arrestin and glucose metabolism in cone cells, but fails to provide an explanation for its specific involvement in glucose metabolism. Consequently, readers may struggle to discern the targeted metabolic pathway.

      We understand this point from Reviewer, and would love to know more about its mechanism, which is one reason why we undertook the current study. The mechanism(s) by which Txnip affects metabolism remains to be elucidated. To summarize our findings from our previous study, we showed that LDHB, which converts lactate to pyruvate, was required for Txnip-mediated rescue. Addition of the LDHB gene, however, did not boost rescue. We also showed that mitochondrial size and membrane potential were improved, and the Na/K pump function was improved, in Txnip-treated cones. Improved mitochondria were not sufficient, however, as revealed by a PARP-1 KO mouse with improved mitochondria that did not extend cone survival. In addition, using a Txnip mutant that does not remove the glucose transporter, we still saw cone rescue, so this function cannot be required for Txnip-mediated rescue. How does Txnip lead to improved mitochondria and to a reliance on lactate? We do not know.

      (2) Although the author conducted an experiment on arrdc14 due to its similarity to Txnip, the lack of clarification on why arrdc4, with a 60% amino acid similarity, did not yield the same effects as Txnip remains unaddressed. Highlighting structural disparities or differences in intracellular signaling pathways could potentially shed light on this incongruity. Subsequently, an additional experiment may be warranted to test the hypothesis regarding the effective component of α-arrestin for cone rescue.

      Additional experiments are needed to learn of the relevant differences between Arrdc4 and Txnip, but are beyond the scope of our work at the present. However, we have added a paragraph on newly published data on the function of Arrdc4 in the new Discussion:

      “Arrdc4, the most similar α-arrestin protein to Txnip that also has Arrestin N- and C- domains, accelerated RP cone death when transduced by AAV (Figure 1). This observation suggests that Txnip has unique functions that protect RP cones. Recently, Arrdc4 has been proposed to be critical for liver glucagon signaling, which could be negated by insulin (Dagdeviren et al. 2023). The implication of this potential role regarding RP cone survival is unclear, but interestingly, the activation of the insulin/mTORC1 pathway is beneficial to RP cone survival (Punzo et al. 2009; Venkatesh et al. 2015).”

      (3) The utilization of distinct mutant Txnip variants to impact RPE, cones, and their combined influence is noted. A comparative table elucidating the impact of cone rescue on these three targets would greatly enhance clarity.

      We presented these data in Figure 4 in a table format.

      Additionally, the text does not definitively establish whether Txnip.C247S.LL351 and 352AA, as well as Txnip.C247S, indeed manifest discrepancies when exclusively affecting RPE.

      We edited a sentence in Results to: “Similar to Best1-wt Txnip (Xue et al., 2021), Best1-Txnip.C247S did not show significant improvement of cone survival, ruling out the C247S mutation alone as promoting the cone survival by Best1-Txnip.C247S.LL351 and 352AA.”

      (4) While the text mentions that Txnip stimulates lactate utilization within cones, it remains unclear whether this effect extends to RPE. If applicable, this trait could potentially contribute to its role in cone rescue.

      We agree with the Reviewer, and hope to address this question in our next study.

      (5) The discussion introduces the notion that one potential mechanism for cone rescue by Txnip.C247S involves facilitating unhindered movement of Thioredoxin for redox processes. To validate this hypothesis and elucidate the mechanics of Txnip's involvement in cone rescue, it may be prudent to conduct further experiments concentrating on the interaction between Txnip and thioredoxin. Alternatively, an experiment aimed at upregulating Thioredoxin expression would be a valuable addition.

      We hope to address this question in the future. However, the effect may be more complicated than our simple hypothesis regarding release of Thioredoxin. More than a dozen proteins were found to differentially interact with Txnip vs. Txnip.C247S (Forred et al. 2016).

      Reviewer #3 (Recommendations For The Authors):

      (1) Glucose transporter 1 is identified as an important mechanism in the protection of cone degeneration. It is unclear why GLut1 is upregulated in retinal cells although the expression of Txnip mutants are specifically in the RPE in Figure 2.

      This retinal GLUT1 upregulation was not consistently observed in the treated eyes, so we did not comment on it in the text.

      (2) Mutant N. Txnip was mentioned in the discussion that it causes obvious retinal degeneration. The quantification of retinal thickness from Figure 2 will be more rigorous.

      Unlike the robust effects of Best1-N.Txnip on RPE GLUT1 level, this negative effect of Best1-N.Txnip on ONL thickness was not consistent. This result does not undermine the other major conclusions. Therefore, we deleted the related sentence of the original text: “This hypothesis is supported by the observation that N.Txnip led to an obvious thinning of the outer nuclear layer of the wt retina, reflecting a loss of photoreceptors”. We did leave in the related finding as follows:

      “The N-terminal half of Txnip (1-228aa) might exert harmful effects in the RPE, that negate the beneficial effects from the C-terminal half, suggested by the observation that its removal, in the C-terminal 149-397 allele, led to better cone survival when expressed in the RPE (Figure 2). In cones, the C-terminal half, including the C-terminal IDR tail, may cooperate with the N-terminal half, or negate its negative effects, to benefit RP cone survival. However, the C-terminal half is not sufficient for cone rescue when expressed in cones, as the 149-397 allele did not rescue.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels.

      Strengths:

      The experiments were thorough and well designed. The results are compelling and support the main claim. The development and the use of the DrosoX two-choice assay put forward for a more quantitative and automatic/unbiased assessment for ingestion volume and preference.

      Weaknesses:

      There are a few inconsistencies with respect the the exact role by which IR60b neurons limit high salt consumption and the contribution of external (labellar) high-salt sensors in regulating high salt consumption. These weaknesses do not significantly impact the main conclusion, however.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Sang et al. set out to identify gustatory receptors involved in salt taste sensation in Drosophila melanogaster. In a two-choice assay screen of 30 Ir mutants, they identified that Ir60b is required for avoidance of high salt. In addition, they demonstrate that activation of Ir60b neurons is sufficient for gustatory avoidance using either optogenetics or TRPV1 to specifically activate Ir60b neurons. Then, using tip recordings of labellar gustatory sensory neurons and proboscis extension response behavioral assays in Ir60b mutants, the authors demonstrate that Ir60b is dispensable for labellar taste neuron responses to high salt and the suppression of proboscis extension by high salt. Since external gustatory receptor neurons (GRNs) are not implicated, they look at Poxn mutants, which lack external chemosensory sensilla but have intact pharyngeal GRNs. High salt avoidance was reduced in Poxn mutants but was still greater than Ir60b mutants, suggesting that pharyngeal gustatory sensory neurons alone are sufficient for high salt avoidance. The authors use a new behavioral assay to demonstrate that Ir60b mutants ingest a higher volume of sucrose mixed with high salt than control flies do, suggesting that the action of Ir60b is to limit high salt ingestion. Finally, they identify that Ir60b functions within a single pair of gustatory sensory neurons in the pharynx, and that these neurons respond to high salt but not bitter tastants.

      Strengths:

      A great strength of this paper is that it rigorously corroborates previously published studies that have implicated specific Irs in salt taste sensation. It further introduces a new role for Ir60b in limiting high salt ingestion, demonstrating that Ir60b is necessary and sufficient for high salt avoidance and convincingly tracing the action of Ir60b to a particular subset of gustatory receptor neurons. Overall, the authors have achieved their aim by identifying a new gustatory receptor involved in limiting high salt ingestion. They use rigorous genetic, imaging, and behavioral studies to achieve this aim, often confirming a given conclusion with multiple experimental approaches. They have further done a great service to the field by replicating published studies and corroborating the roles of a number of other Irs in salt taste sensation. An aspect of this study that merits further investigation is how the same gustatory receptor neurons and Ir in the pharynx can be responsible for regulating the ingestion of both appetitive (sugar) and aversive tastants (high salt).

      A previous report published in eLife from John Carlson’s lab (Joseph et al, 2017) showed that the Ir60b GRN in the pharynx responds to sucrose resulting in sucrose repulsion. Thus, stimulation of this pharyngeal GRN results in gustatory avoidance only, not both attraction and avoidance. (lines 205-207)

      Weaknesses:

      There are several weaknesses that, if addressed, could greatly improve this work.

      (1) The authors combine the results and discussion but provide a very limited interpretation of their results. More discussion of the results would help to highlight what this paper contributes, how the authors interpret their results, and areas for future study.

      We agree and have now separated the Results and Discussion, and in so doing have greatly expanded discussion of the results.

      (2) The authors rename previously studied populations of labellar GRNs to arbitrary letters, which makes it difficult to understand the experiments and results in some places. These GRN populations would be better referred to according to the gustatory receptors they are known to express.

      One of the corresponding authors (Craig Montell) introduced this alternative GRN nomenclature in a review in 2021: Montell, C. (Drosophila sensory receptors—a set of molecular Swiss Army Knives. Genetics 217, 1-34) (Montell, 2021). We are not fans of referring to different classes of GRNs based on the receptors that they express since it is not obvious which receptors to use. For example, the GRNs that respond to bitter compounds all express multiple GR co-receptors. The same is true for the GRNs that respond to sugars. The former system of referring to GRNs simply as sugar, bitter, salt and water GRNs is also not ideal since the repertoire of chemicals that stimulates each class is complex. For example, the Class A GRNs (formerly sugar GRNs) are also activated by low Na+, glycerol, fatty acids, and acetic acid, while the B GRNs (former bitter GRNs) are also stimulated by high Na+, acids, polyamines, and tryptophan. In addition, there are five classes of GRNs. At first mention of the Class A—E GRNs, we mention the most commonly used former nomenclature of sugar, bitter, salt and water GRNs. In addition, for added clarify, we now also include a mention of one of the receptors that mark each class. (lines 51-59)

      (3) The conclusion that GRNs responsible for high salt aversion may be inhibited by those that function in low salt attraction is not well substantiated. This conclusion seems to come from the fact that overexpression of Ir60b in salt attraction and salt aversion sensory neurons still leads to salt aversion, but there need not be any interaction between these two types of sensory neurons if they act oppositely on downstream circuits.

      We did not make this claim.

      (4) The authors rely heavily on a new Droso-X behavioral apparatus that is not sufficiently described here or in the previous paper the authors cite. This greatly limits the reader's ability to interpret the results.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      Reviewer #3 (Public Review):

      Summary:

      Sang et al. successfully demonstrate that a set of single sensory neurons in the pharynx of Drosophila promotes avoidance of food with high salt concentrations, complementing previous findings on Ir7c neurons with an additional internal sensing mechanism. The experiments are well-conducted and presented, convincingly supporting their important findings and extending the understanding of internal sensing mechanisms. However, a few suggestions could enhance the clarity of the work.

      Strengths:

      The authors convincingly demonstrate the avoidance phenotype using different behavioral assays, thus comprehensively analyzing different aspects of the behavior. The experiments are straightforward and well-contextualized within existing literature.

      Weaknesses:

      Discussion

      While the authors effectively relate their findings to existing literature, expanding the discussion on the surprising role of Ir60b neurons in both sucrose and salt rejection would add depth. Additionally, considering Yang et al. 2021's (https://doi.org/10.1016/j.celrep.2021.109983) result that Ir60b neurons activate feeding-promoting IN1 neurons, the authors should discuss how this aligns with their own findings.

      Yang et al. demonstrated that the activation of Ir60b neurons can trigger the activation of IN1 neurons akin to pharyngeal multimodal (PM) neurons, potentially leading to enhanced feeding (Yang et al, 2021). However, our research reveals a specific pattern of activation for Ir60b neurons. Instead of being generalists, they are specialized for certain sugars, such as sucrose and high salt. Consequently, while Ir60b GRNs activate IN1 neurons, we contend that there are other neurons in the brain responsible for inhibiting feeding. (lines 412-417)

      Lines 187: The discussion primarily focuses on taste sensillae outside the labellum, neglecting peg-type sensillae on the inner surface. Clarification on whether these pegs contribute to the described behaviors and if the Poxn mutants described also affect the pegs would strengthen the discussion.

      We added the following to the Discussion section. “We also found that the requirement for Ir60b appears to be different when performing binary liquid capillary assay (DrosoX), versus solid food binary feeding assays. When we employed the DrosoX assay to test mutants that were missing salt aversive GRNs in labellar bristles but still retained functional Ir60b GRNs, the flies behaved the same as wild-type flies (e.g. Figure 3J and 3L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al, 2015), displayed repulsion to high salt food that was intermediate between control flies and the Ir60b mutant (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), and these hairless taste organs become exposed to food only when the labial palps open. We suggest that there are high-salt sensitive GRNs associated with taste pegs, which are accessed when the labellum contacts a solid substrate, but not when flies drink from the capillaries used in DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B).”. (lines 430-444)

      In line 261 the authors state: "We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, similar to what was observed with Ir56b 8; however, this did not generate a salt receptor (Figures S6A)"

      An obvious explanation would be that these neurons are missing the identified necessary co-receptors Ir76b and Ir25a. The authors should discuss here if the Gr33a neurons they target also express these co-receptors, if yes this would strengthen their conclusion that an additional receptor might be missing.

      We clarified this point in the Discussion section as follows, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al, 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al, 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites”. (lines 464-477)

      Methods

      The description of the Droso-X assay seems to be missing some details. Currently, it is not obvious how the two-choice is established. Only one capillary is mentioned, I assume there were two used? Also, the meaning of the variables used in the equation (DrosoX and DrosoXD) are not explained.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      The description of the ex-vivo calcium imaging prep. is unclear in several points:

      (1) It is lacking information on how the stimulus was applied (was it manually washed in? If so how was it removed?).

      We expanded the description of the apparatus in the ex vivo calcium imaging section of the Materials and Methods. (lines 682-716)

      (2) The authors write: "A mild swallow deep well was prepared for sample fixation." I assume they might have wanted to describe a "shallow well"?

      We deleted the word “deep.”.(line 691)

      (3) "...followed by excising a small portion of the labellum in the extended proboscis region to facilitate tastant access to pharyngeal organs." It is not clear to me how one would excise a small portion of the labellum, the labellum depicts the most distal part of the proboscis that carries the sensillae and pegs. Did the authors mean to say that they cut a part of the proboscis?

      Yes. We changed the sentence to “…followed by excising a small portion of the extended proboscis to facilitate tastant access to the pharyngeal organs.”.(lines 693)-695

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels. In general, I find the collective evidence presented by the authors convincing. But I feel the MS can benefit from having a discussion session and a few simple experiments. Below I listed some inconsistencies I hope the authors can address or at least discuss.

      We have now added a Discussion section, and expanded the discussion.

      (1) The role of IR60b neurons on suppressing PER appeared inconsistent. On the one hand, optogenetic activation of these neurons suppressed PER (Fig 1D), on the other hand, IR60b mutants were as competent to suppress PER in response to high salt as WT (Fig 2G). Are pharyngeal neurons expected to modulate PER? It might be worth including a retinal-free or genotype control to ascertain the PER suppression exhibited by IR60b>CsChrimson is genuine.

      Please note that Figure 2G is now Figure 2H.

      Our interpretation is that activation of aversive GRNs by high salt either in labellar bristles or in the pharynx is sufficient to inhibit repulsion to high salt. Consistent with this conclusion, optogenetic activation of Ir60b GRNs, which are specific to the pharynx, is sufficient to reduce the PER to sucrose containing food (Figure 1D). However, mutation of Ir60b has no impact on the PER to sucrose plus high (300 mM) NaCl since the high-salt activated GRNs in labellar bristles are not impaired by the Ir60b mutation. In contrast, Ir25a and Ir76b are required in both labellar bristles and in the pharynx to reject high salt. As a consequence, mutation of either Ir25a or Ir76b impairs the repulsion to high salt. Thus, there is no inconsistency between the optogenetics and PER results. We clarified this point in the Discussion section. In terms of controls for IR60b>CsChrimson, we show that UAS-CsChrimson alone or UAS-CsChrimson in combination with the Gr5a driver has no impact on the PER (Figure 1D). In addition, we now include a retinal free control (Figure 1D). These findings provide the key genetic controls and are described in the Results section. (lines 167-170)

      (2) The role of labellar high-salt sensors in regulating salt intake appeared inconsistent. On the one hand, they appeared to have a role in limiting high salt consumption because poxn mutants were significantly more receptive to high salt than WT (Fig. 2J). On the other hand, selectively restoring IR76b or IR25a in only the IR60b neurons in these mutants - thus leaving the labellar salt sensors still defective - reverted the flies to behave like WT when given a choice between sucrose vs. sucrose+high salt (Fig 3J, L).

      We now offer an explanation for these seemingly conflicting results in the Discussion section. When we employed the DrosoX assay with mutants with functional Ir60b GRNs, but were missing salt aversive GRNs in labellar bristles, the flies behaved the same as control flies (e.g. Figure 3J and L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al., 2015), display aversion high salt food intermediate between control and Ir60b mutant flies (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), which are exposed to food substrates only when the labial palps open. We suggest that the taste pegs harbor high salt sensitive GRNs, and they may be exposed to solid substrates, but not to the liquid in capillary tubes used in the DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B). (lines 433-444)

      (3) The behavior sensitivity of IR60b mutant to high salt again appeared somewhat inconsistent when assessed in the two different choice assays. IR60b mutant flies were indifferent to 300 mM NaCl when assayed with DrosoX (Fig 3A, B) but were clearly still sensitive to 300 mM NaCl when assayed with "regular" assay - they showed much reduced preference for 5 mM sucrose over 1 mM sucrose when the 5 mM sucrose was adulterated with 300 mM NaCl (Fig 1B).

      The explanation provided above may also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but not when selecting between 300 mM NaCl and 5 mM sucrose versus 1 mM sucrose in the solid food binary assay (Figure 1B). Alternatively, the different behavioral responses might be due to the variation in sucrose concentrations in each of these two assays, which employed 5 mM sucrose in the solid food binary assay, as opposed to 100 mM sucrose in the DrosoX assay. This disparity in attractive valence between these two concentrations of sucrose might consequently impact feeding amount and preference. This point is now also included in the Discussion section. (lines 441-449)

      (4) Given the IR60b neurons exhibited clear IR60b/IR25a/IR76b-dependent sucrose sensitivity, too, I am curious how the various mutant animals behave when given a choice between 100 mM sorbitol vs. 100 mM sorbitol + 300 mM NaCl, a food choice assay not complicated by the presence of sucrose. Similarly, I am curious if the Ca2+ response of IR60 neurons differs significantly when presented with 100 mM sucrose vs. when presented with 100 mM sucrose + 300 mM NaCl. In principle, the magnitude for the latter should be significantly larger than the former as animals appeared to be capable of discriminating these two choices solely relying on their IR60b neurons.

      To investigate the aversion induced by high salt in the absence of a highly attractive sugar, such as sucrose, we combined 300 mM salt with 100 mM sorbitol, which is a tasteless but nutritive sugar (Burke & Waddell, 2011; Fujita & Tanimura, 2011). Using two-way choice assays, we found that the Ir25a, Ir60b, and Ir76b mutants exhibited substantial reductions in high salt avoidance (Figure 3—figure supplement 2A). In addition, we performed DrosoX assays using 100 mM sorbitol alone, or sorbitol mixed with 300 mM NaCl. Sorbitol alone provoked less feeding than sucrose since it is a tasteless sugar (Figure 3—figure supplement 2B and C). Nevertheless, addition of high salt to the sorbitol reduced food consumption (Figure 3—figure supplement 2B and C). (lines 300-308)

      We also conducted a comparative analysis of the Ca2+ responses within the Ir60b GRN, examining its reaction to various stimuli, including 100 mM sucrose alone, 300 mM NaCl alone, and a combination of 100 mM sucrose and 300 mM NaCl. We found that the Ca2+ responses were significantly higher when we exposed the Ir60b GRN to 300 mM NaCl alone, compared with the response to 100 mM sucrose alone (Figure 4—figure supplement 1D). However, the GCaMP6f responses was not higher when we presented 100 mM sucrose with 300 mM NaCl, compared with the response to 300 mM NaCl alone (Figure 4—figure supplement 1D). (lines 360-367)

      Minor issues

      (1) The labels of sucrose concentration on Figure 2D were flipped.

      This has been corrected.

      (2) The phrasing of the sentence that begins in line 196 (i.e., "This suggests the internal sensor ...") is not as optimal.

      We changed the sentence to, “We found that the aversive behavior to high salt was reduced in the Poxn mutants relative to the control (Figure 2J), consistent with previous studies demonstrating roles for GRNs in labellar bristles in high salt avoidance (Jaeger et al, 2018; McDowell et al, 2022; Zhang et al, 2013).”. (lines 217-219)

      (3) In Line 231, I am not sure why the authors think ectopic expressing IR60b in labellar neurons would allow them to become activated by Na+. It seems highly unlikely to me, especially given IR60b also plays a role in sensing sugar.

      We added the following paragraph to the Discussion addressing this point, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al., 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al., 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites.”. (lines 464-477)

      Reviewer #2 (Recommendations For The Authors):

      Line 41, acutely excessive salt ingestion can lead to death, not just health issues

      We now state that, “consumption of excessive salt can contribute to various health issues in mammals, including hypertension, osteoporosis, gastrointestinal cancer, autoimmune diseases, and can lead to death.”. (lines 41-43)

      Line 46, delete the comma after flies

      Done. (line 47)

      Lines 51-56: This description is unnecessarily confusing and does not cite proper sources. Renaming these GRNs arbitrarily can only create confusion, plus this description lacks nuance. If E GRNs are Ir94e positive, this description is out of date. Furthermore, If D GRNs are ppk23 and Gr66a positive then they will respond to both bitter and high salt.

      Papers to consult: https://elifesciences.org/articles/37167 10.1016/j.cell.2023.04.038

      We have now added citations. We prefer the A—E nomenclature, which was introduced in a 2021 Genetics review by one of the authors of this manuscript (Montell) (Montell, 2021) since naming different classes of GRNs on the basis of markers or as sweet, bitter, salt and water GRNs is misleading and an oversimplification. We cite the Genetics 2021 review, and for added clarity include both types of former names (markers and sweet, bitter, salt and water). Class D GRNs are not marked by Gr66a. The eLife reference cited above provided the initial rationale for stating that Class E GRNs are marked by Ir94e and activated by low salt. According to the Taisz et al reference (Cell 2023), the Class E GRNs, which are marked by Ir94e, are also activated by pheromones, which we now mention (Taisz et al, 2023). (lines 51-59)

      Line 62, E GRNs are not required for low salt behaviors

      We do not state that E GRNs are required for low salt behaviors, only that they sense low Na+ levels. (line 58)

      Line 70-81 - Great deal of emphasis on labellar GRNs but then no mention of how pharyngeal GRNs fit into categories A-E

      We devote the following paragraph to pharyngeal GRNs. We do not mention how they fit in with the A—E categories because it is not clear.

      “In addition to the labellum and taste bristles on other external structures, such as the tarsi, fruit flies are endowed with hairless sensilla on the surface of the labellum (taste pegs), and three internal taste organs lining the pharynx, the labral sense organ (LSO), the ventral cibarial sense organ (VCSO), and the dorsal cibarial sense organ (DCSO), which also function in the decision to keep feeding or reject a food (Chen & Dahanukar, 2017, 2020; LeDue et al., 2015; Nayak & Singh, 1983; Stocker, 1994). A pair of GRNs in the LSO express a member of the gustatory receptor family, Gr2a, and knockdown of Gr2a in these GRNs impairs the avoidance to slightly aversive levels of Na+ (Kim et al, 2017). Pharyngeal GRNs also promote the aversion to bitter tastants, Cu2+, L-canavanine, and bacterial lipopolysaccharides (Choi et al, 2016; Joseph et al., 2017; Soldano et al, 2016; Xiao et al, 2022). Other pharyngeal GRNs are stimulated by sugars and contribute to sugar consumption (Chen & Dahanukar, 2017; Chen et al, 2021; LeDue et al., 2015). Remarkably, a pharyngeal GRN in each of the two LSOs functions in the rejection rather the acceptance of sucrose (Joseph et al., 2017).”. (lines 74-89)

      Line 89, aversive --> aversion

      We changed this part.

      Line 90, gain of aversion capsaicin avoidance suggests they are sufficient for avoidance, not essential for avoidance.

      We changed “essential” to “sufficient.”. (line 100)

      Line 104, what are you recording from here? Labellar or pharyngeal GRNs

      We added “S-type and L-type sensilla” to the sentence. (line 119)

      Line 107, How are A GRNS marked with tdTomato? It is important to mention how you are defining A GRNs.

      We modified the sentence as follows: “Using Ir56b-GAL4 to drive UAS-mCD8::GFP, we also confirmed that the reporter was restricted to a subset of Class A GRNs, which were marked with LexAop-tdTomato expressed under the control of the Gr64f-LexA (Figure 1—figure supplement 1D—F).”. (lines 120-123)

      Line 124, should read "concentrated as sea water."

      We made the change. (line 142)

      Line 125, I am not sure what is meant by "alarm neurons"

      We changed “additional pain or alarm neurons” to “nociceptive neurons.”. (line 144)

      Line 141, Are you definitely A GRNs as only labellar GRNs, i.e. the Gr5a-GAL4 pattern with labellar plus few pharyngeal GRNs? Or are the defining it as Gr64f-GAL4 (i.e. labellar plus many pharyngeal GRNs)

      We refer to the Class A—E GRNs as labellar GRNs. Therefore, in this instance, we removed the reference to A GRNs and B GRNs, and simply mention the drivers that we used (Gr5a-GAL4 and Gr66a-GAL4) to express UAS-CsChrimson. The modified sentence is, “As controls we drove UAS-CsChrimson under control of either the Gr5a-GAL4 or the Gr66a-GAL4.”. (lines 51-59, 160-161)

      Line 180, labellar hairs--> labellar taste bristles

      We made the change. (line 204)

      Line 190, possess only --> only possess

      We made the change. (line 216)

      Line 202, Should this read increased?

      Yes. We changed “reduced” to “increased.”. (line 225)

      Line 206, The information provided here and in reference 47 was not sufficient for me to understand how the Droso-X system works and whether it has been validated. Better diagrams and much more description is required for the reader to understand this system and assess its validity

      We now explain that the DrosoX “system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary is then monitored automatically over the course of 6 hours and recorded on a computer.”. (lines 238-243)

      Line 218-219, It would be helpful to expand on this to explain how the previous paper detected no difference. Is this because the contact time with the food is the same but the rate of ingestion is slower?

      Yes. This is correct. We now clarify this point by stating that, “In a prior study, it was observed that the repulsion to high salt exhibited by the Ir60b mutant was indistinguishable from wild-type (Joseph et al., 2017). Specifically, the flies were presented with drop of liquid (sucrose plus salt) at the end of a probe, and the Ir60b mutant flies fed on the food for the same period of time as control flies (Joseph et al., 2017). However, this assay did not discern whether or not the volume of the high salt-containing food consumed by the Ir60b mutant flies was reduced relative to control flies. Therefore, to assess the volume of food ingested, we used the DrosoX system, which we recently developed (Figure 3—figure supplement 1A) (Sang et al, 2021). This system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary was then monitored automatically over the course of 6 hours and recorded on a computer. We found that control flies consuming approximately four times more of the 100 mM sucrose than the sucrose mixed with 300 mM NaCl (Figure 3A). In contrast, the Ir25a, Ir60b, and Ir76b mutants consumed approximately two-fold less of the sucrose plus salt (Figure 3A). Consequently, they ingested similar amounts of the two food options (Figure 3B; ingestion index). Thus, while the Ir60b mutant and control flies spend similar amounts of time in contact with high salt-containing food when it is the only option (Joseph et al., 2017), the mutant consumes considerably less of the high salt food when presented with a sucrose option without salt.”. (lines 226-251)

      Lines 231-235, Is this evidence for this, that Ir60b expression in the Ir25a or Ir76b pattern will induce high salt responses in the labellum? You should elaborate on this to clearly state what you mean rather than implying it. I do not think that overexpression of one Ir is enough evidence for this sweeping conclusion.

      We agree. We eliminated this point. (lines 227-232)

      Lines 261-263, Please elaborate here, how did you target the I-type sensilla and where are these neurons? So they already express Ir76b and Ir25a?

      We now explain in the Results that, “We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. Gr33a is co-expressed with Gr66a (Moon et al., 2009), which has been shown to be co-expressed Ir25a and Ir76b (Li et al., 2023). When we performed tip recordings from I7 and I10 sensilla, we did not observe a significant increase in action potentials in response to 300 mM NaCl (Figure 4—figure supplement 1A), indicating that ectopic expression of Ir60b in combination with Ir25a and Ir76b is not sufficient to generate a high salt receptor.”. (lines 324-330)

      Lines 300-303, The discussion needs to be greatly expanded. What is the proposed mechanism by which the same neurons/receptors can inhibit sucrose and high salt feeding? What is the author's interpretation of what this study adds to our understanding of taste aversion?

      We have now added a Discussion section and greatly expanded the discussion.

      Reviewer #3 (Recommendations For The Authors):

      In line 73 there is a typo in "esophagus"

      We changed this part.

      In line 331, the use of a mixture of sucrose and "saponin" seems to be a mistake; "NaCl" is likely intended.

      We made the correction. (lines 546 and 640)

      On several occasions, the authors refer to the pharynx as a taste organ (for example 1st sentence of the abstract). I am not sure this is correct, the actual pharyngeal taste organs are the LSO, DSCO, and VSCO which are located in the pharynx.

      We made the corrections. (lines 24, 90, 92, 93, and 356)

      In line 155 the authors refer to Ir25a and Ir76b as "broadly tuned". I think it is not correct to refer to co-receptors this way, I'd suggest to just call them co-receptors.

      We made the correction. (lines 177-178)

      In line 182, stating "Gr2a is also expressed in the proboscis" is unclear. Clarify whether it refers to sensillae, pharyngeal taste organs, etc.

      We clarified it refers to pharyngeal taste organs. (lines 206-207)

      Line 253: "These finding imply that all three Irs are coexpressed in the pharynx." "The pharynx" is very unspecific, did the authors mean to say "the same neuron"?

      We now clarify by saying “in the Ir60b GRN in the pharynx.”. (line 317)

      Figures & Legends

      I found it confusing that the same color scale is being reused for different panels with different meanings repeatedly and in inconsistent ways. For example in Figure 2, red and blue are being used for Ir25a² mutants, while blue is also being used for Gr64f-Gal4 and S type sensilla. It is also not easily visible nor mentioned in the caption which of the 3 color scales presented belong to which panels.

      We modified the colors in the figures so that they are used in a consistent way. We now also define the colors in the legends.

      In Figure 2 F-I, indicating the stimulus sequence in each panel would enhance clarity. The color scale in Figure 3 could benefit from explicit explanations of different shades in the caption for easier interpretation.

      For example: "The ingestion of (a, dark color) 100 mM sucrose alone and (b, light color) in combination with 300 mM"

      We made the suggested modification.

      In Figure 4a the authors highlight that Ir76b and Ir25a label 2 neurons in the LSO. Did the imaging in 4c also capture the second cell, and if so did it respond to their stimulation?

      No, the focal plane differs, and the signal in Figure 4C is considerably weaker compared to the immunohistochemistry shown in Figure 4A. Notably, the other neuron did not exhibit a response to NaCl.

      In Figure 4f a legend for the color scale is missing, or the color might not be necessary at all. Also, the asterisks seem to be shifted to the right.

      We fixed the shifted asterisks and eliminated the color.

      Figure 4i is mislabeled 4f

      We made the correction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study highlights new insights into the mechanism of pheochromocytoma pathogenesis that remains poorly understood. In the context of hereditary syndromes, such as multiple endocrine neoplasia 2 (MEN-2), where RET mutation is the major driver of thyroid, parathyroid, and adrenal pathologies, including pheochromocytoma, this mechanistic dissection of RET and TMEM127 is fundamentally sound. While the significance was deemed important, the strength of the evidence was found to be solid,

      Recognizing the limitations of models available for study of neuroendocrine cancers, and specifically for pheochromocytomas, we have revised and clarified the text of the current manuscript version and provide specific responses to the additional comments provided below, highlighting changes and new data.

      Reviewer #1 (Recommendations For The Authors):

      A current lack of pheochromocytoma cell lines and the use of generated cell lines for mechanistic studies presents a significant challenge that may undermine the inferred value of these findings in mock in vitro systems and question reproducibility in pheochromocytoma. Consideration for 3-dimensional patient-derived pheochromocytoma organoid in vitro and patient-derived organoid xenograft in vivo models will enable confirmation or refute novel findings described by the authors.

      We agree completely with Reviewer 1 that ideally, we should replicate these findings with PCC-derived cells in vitro and in organoids. Despite many attempts, PCC cell lines have proved a major challenge for the field of neuroendocrine cancers. Cell line models are not available and PDOs have proven poorly growing and resistant to manipulations, such as CRISPR KOs or siRNA KD. In studies completed since the submission and review of the present manuscript, and subsequently published elsewhere, we have shown that RET protein is highly expressed in TMEM127-mutant PCC by immunohistochemistry. We also showed that the TMEM127-KO SH-SY5Y cell model does grow more robustly than Mock-KO cells in nude mice and that RET inhibition (Selpercatinib) does lead to tumor regression (Guo et al., 2023), suggesting that our findings may be reproducible in vivo. These findings, and potential caveats of the cell models used have been further discussed in the text.

      Reviewer #2 (Recommendations For The Authors):

      Most notably, all experiments are conducted in an isogenic single-cell line. This exposes the whole story to be potentially confounded by unknown variables.

      In addition, studies would benefit from the adding back of TMEM127, or other methods to modulate endosome and plasma membrane dynamics to mechanistically secure the cause of the findings.

      As suggested by Reviewer 2, we have generated a TMEM127 KO in HEK293, an unrelated cell line which expressed low levels of TMEM127 but does not express RET. Consistent with our findings in SH-SY5Y, we saw increased membrane accumulation of endogenous membrane proteins N-cadherin and transferrin receptor-1 in these cells in the absence of TMEM127. Additionally, re-expression of a wildtype TMEM127 (FLAG-TMEM127) in these cells led to dramatic decreases in membrane localization of these proteins (Supplemental Figure 1D). These data suggest that membrane accumulation is indeed TMEM127 dependent, and that these processes are not directly dependent on RET expression.

      References

      Guo, Q., Z.M. Cheng, H. Gonzalez-Cantu, M. Rotondi, G. Huelgas-Morales, P. Ethiraj, Z. Qiu, J. Lefkowitz, W. Song, B.N. Landry, H. Lopez, C.M. Estrada-Zuniga, S. Goyal, M.A. Khan, T.J. Walker, E. Wang, F. Li, Y. Ding, L.M. Mulligan, R.C.T. Aguiar, and P.L.M. Dahia. 2023. TMEM127 suppresses tumor development by promoting RET ubiquitination, positioning, and degradation. Cell Rep. 42:113070.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript by DeHaro-Arbona et al., the authors wish to understand how a signaling pathway (Notch) is dynamically decoded to elicit a specific transcriptional output. In particular, they investigate the kinetic properties of Notch-responsive nuclear complexes (the DNA binding factor CSL and its co-activator Mastermind (mam) along with several candidate interacting partners). Their experimental model is the polytene chromosome of the Drosophila salivary gland, in which the naturally inactive Notch can be artificially induced through the expression of a constitutively active form of Notch.

      The authors develop a series of CRISPR and transgenic lines enabling the live imaging of these complexes at a specific locus and in various backgrounds (genetic perturbations/drug treatments). This quantitative live imaging data suggests that Notch nuclear complexes form hubs, and the authors characterize their binding dynamics. Interestingly, they elegantly demonstrate that the content of these hubs and their kinetic properties can evolve, even within Notch ON cells. Hence, they propose the existence of distinct hubs, distinguishing an open (CSL), engaged (CSK-Mam), or active (CSL-Mam-Med-PolII) configuration in Notch ON cells and an inactive hub (in Notch OFF having previously been exposed to Notch) state, that would explain the surprising transcriptional memory that the authors observe hours after Notch withdrawal.

      We thank the reviewer for this constructive summary of our work

      Reviewer #2 (Public Review):

      The manuscript from deHaro-Arbona et al, entitled "Dynamic modes of Notch transcription hubs conferring memory and stochastic activation revealed by live imaging the co-activator Mastermind", uses single molecule microscopy imaging in live tissues to understand the dynamics and molecular determinants of transcription factor recruitment to the E(spl)-C locus in Drosophila salivary gland cells under Notch-ON and -OFF conditions. Previous studies have identified the major players that are involved in transcription regulation in the Notch pathway, as well as the importance of general transcriptional coregulators, such as CBP/P300 and the Mediator CDK module, but the detailed steps and dynamics involved in these processes are poorly defined. The authors present a wealth of single molecule data that provides significant insights into Notch pathway activation, including:

      (1) Activation complexes, containing CSL and Mam, have slower dynamics than the repressor complexes, containing CSL and Hairless.

      (2) Contribution of CSL, NICD, and Mam IDRs to recruitment.

      (3) CSL-Mam slow-diffusing complexes are recruited and form a hub of high protein concentrations around the target locus in Notch-ON conditions.

      (4) Mam recruitment is not dependent on transcription initiation or RNA production.

      (5) CBP/P300 or its associated HAT activity is not required for Mam recruitment.

      (6) Mediator CDK module and CDK8 activity are required for Mam recruitment, and vice-versa, but not CSL recruitment.

      (7) Mam is not required for chromatin accessibility but is dependent on CSL and NICD.

      (8) CSL recruitment and increased chromatin accessibility persist after NICD removal and loss of Mam, which confers a memory state that enables rapid re-activation in response to subsequent Notch activation.

      (9) Differences in the proportions of nuclei with both Pol II and with Mam enrichment, which results in transcription being probabilistic/stochastic. These data demonstrate that the presence of Mamcomplexes is not sufficient to drive all the steps required for transcription in every Notch-ON nucleus.

      (10) The switch from more stochastic to robust transcription initiation was elicited when ecdysone was added.

      Overall, the manuscript is well written, concise, and clear, and makes significant contributions to the Notch field, which are also important for a general understanding of transcription factor regulation and behavior in the nucleus. I recommend that the authors address my relatively minor criticisms detailed below.

      We thank the reviewer for their thorough and constructive summary of our work. We are glad that they overall found it insightful and interesting. Below we have addressed the points they have raised.

      Page 7, bottom. The authors speculate, "It is possible therefore that, once recruited, Mam can be retained at target loci independently of CSL by interactions with other factors so that it resides for longer." Is it possible that another interpretation of that data is that Mam is a limiting factor?

      As indicated our comment is a speculation and is based on the observations summarized in the paragraph. We are not entirely sure what the reviewer is proposing as an alternate model. However, if it relates to the relative concentrations of the different factors, this would not account for the differences in trajectory durations. And for most aspects of our analysis, K[off] has the most profound influence on the results. Furthermore, differences persist even when CSL levels are considerably reduced (as in conditions with Hairless RNAi).

      Page 9. The authors write, "A very low level of enrichment was evident for... for the CSL Cterminus..". The recruitment of CSL ct IDR does not appear to be statistically significant or there is no apparent difference (Figure S2C), suggesting the CSL ct IDR does not play a role in enrichment.

      We agree with the comments of the reviewer and have adjusted the text on page 9 accordingly.

      Page 9. The authors write, "Notably, MamnIDR::GFP fusion was present in droplets, suggesting it can self-associate when present in a high local concentration (Figure S2B)." Is this result only valid for Mam nIDR or does full-length Mam also localize into droplets, as has been previously observed for full-length mammalian Maml1 in transfected cells?

      We agree that the observed foci of MamL1 that have been detected in mammalian cells are interesting. We have not tried to replicate those data because the large size of Mam has made it challenging to produce a full-length form in over-expression. We note however that another portion of Mam, MamIDR, does not make droplets when over-expressed despite it containing a large section of the disordered region of the Drosophila Mam. We have now included a comment about the mammalian data in the text (page 9) to put our findings in context.

      Previous studies in mammalian cells suggest that Maml1 is a high-confidence target for phosphorylation by CDK8, see Poss et al 2016 Cell Reports https://doi.org/10.1016/j.celrep.2016.03.030. By sequence comparison, does fly Mam have similar potential phosphorylation sites, and might these be critical for Mam/CDK module recruitment?

      We thank the reviewer for highlighting this point. Indeed, we were very excited when we learnt that MamL1 was found to be a high confidence CDK8 target and we looked hard in the Mam sequence for potential phosphorylation sites. Sadly, there is very little conservation between the fly and the mammalian proteins beyond the helical region that contacts CSL and NICD. Furthermore, there are no identifiable putative CDK8 phosphorylation sites based on conventional motifs. It therefore remains to be established whether or not Mam is a direct target of the CDK8 kinase activity. We have added an explanatory comment in the text (page 11).

      Page 11: The authors write, "The differences in the effects on Mam and CSL imply that the CDK module is specifically involved in retaining Mam in the hub, and that in its absence other CSL complexes "win-out", either because the altered conditions favour them and/or because they are the more abundant." Are the "other" complexes the authors are referring to Hairless-containing complexes? With the reagents the authors have in hand couldn't this be explicitly shown for CSLcomplexes rather than speculated upon?

      The reviewer is correct that CSL complexes containing Hairless are good candidates to be recruited in these conditions. We have compared the levels of Hairless at E(spl)-C following treatments with Senexin and have not detected a difference. However, it appears that the high proportion of unbound Hairless makes it difficult to detect/quantify the enrichment at E(spl)-C. We have therefore taken a different strategy, which is to measure the recruitment of a mutant form of CSL that is compromised for Hairless binding. Recruitment of the mutant CSL is detected in Notch-ON conditions, but is significantly reduced/absent following Senexin treatment. These data favour the model proposed by the reviewer that in the absence of CDK8 activity, the CSL-Hairless complexes win out. These new data have been added in new Supplementary Figure S3F and S3G (and see text page 11)

      Page 12/13: The authors write, "Based on these results we propose that, after Notch activity decays, the locus remains accessible because when Mam-containing complexes are lost they are replaced by other CSL complexes (e.g. co-repressor complexes)." Again, why not actually test this hypothesis rather than speculate? The dynamics of Hairless complexes following the removal of Notch would be very interesting and build upon previously published results from the Bray lab.

      We thank the reviewer for this comment and we agree it’s possible that the proportion of Hairless complexes increases after Notch withdrawal. However, for the reasons outlined above, it is difficult to quantify changes in Hairless, (and our preliminary experiment did not reveal any large-scale effect) and because of the complexity of the genetics we cannot straightforwardly extend the experiment to analyze the behaviour of the mutant CSL as above. Therefore, at present, we cannot say whether the loss of Mam is compensated by an increase in Hairless. We hope in future to investigate the characteristics of the memory in more depth.

      Page 13: The authors write, "As Notch removal leads to a loss of Mam, but not CSL, from the hub, it should recapitulate the effects of MamDN." While the data in Figure 5B seem to support this hypothesis, it's not clear to me that the loss of Mam and MamDN should phenocopy each other, bc in the case of MamDN, NICD would still be present.

      We apologise that this sentence was a bit misleading. We have now rewritten it to improve accuracy (page 13) “As Notch removal leads to a loss of Mam, but not CSL, from the hub, we hypothesised it would recapitulate the effects of MamDN on chromatin accessibility and transcription of targets.”

      The temporal dynamics for Mam recruitment using the temperature- and optogenetic-paradigms are quite different. For example, in the optogenetic time course experiments, the preactivated cells are in the dark for 4 hours, while in the temperature-controlled experiments, there is still considerable enrichment of Mam at 4 hours. For the preactivated optogenetic experiments, how sure are the authors that Mam is completely gone from the locus, and alternatively, can the optogenetic experimental results be replicated in the temperature-controlled assays? My concern is whether the putative "memory" observation is just due to incomplete Mam removal from the previous activation event.

      We appreciate the concerns of the reviewer. However, we are confident that the 4-hour optogenetic inactivation is much more effective than the equivalent time for temperature shifts. The temperature sensitive experiment involves a longer decay, because not only the protein but also the mRNA has to decay to fully remove NICD activity. The optogenetic experiments, involve only protein decay and so are more acute. Furthermore, we have tested (and we show in Figure 5H) that Mam is fully depleted after 4 hours “Off” in the optogenetic experiments.

      In order to further strengthen the evidence in favour of the memory hub, we have extended the time-frame further to show that CSL is retained at the locus even after 24 hours “Notch OFF” in both the temperature and the optogenetic paradigm. We have also measured the effects on transcription after a 24hr OFF period using the optogenetic paradigm and seen that robust transcription is initiated in cells that have experienced a previous activation (preactivated) compared to those that have not (naïve). These new data have been added to new Figure 5 C-F and strongly support the memory model.

      Reviewer #3 (Public Review):

      Summary:

      DeHaro-Arbona and colleagues investigate the in vivo dynamics of Notch-dependent transcriptional activation with a focus on the role of the Mastermind (MAM) transcriptional co-activator. They use GFP and HALO-tagged versions of the CSL DNA-binding protein and MAM to visualize the complex, and Int/ParB to visualize the site of Notch-dependent E(Spl)-C transcription. They make several conclusions. First, MAM accumulates at E(Spl)-C when Notch signaling is active, just like CSL. Second, MAM recruits the CDK module of Mediator but does not initiate chromatin accessibility. Third, after signaling is turned off, MAM leaves the site quickly but CSL and chromatin accessibility are retained. Fourth, RNA pol II recruitment, Mediator recruitment, and active transcription were similar and stochastic. Fifth, ecdysone enhances the probability of transcriptional initiation.

      Strengths:

      The conclusions are well supported by multiple lines of extensive data that are carefully executed and controlled. A major strength is the strategic combination of Drosophila genetics, imaging, and quantitative analyses to conduct compelling and easily interpretable experiments. A second major strength is the focus on MAM to gain insights into the dynamics of transcriptional activation specifically.

      We thank the reviewer for their positive comments about the strengths of our work.

      Weaknesses:

      Weaknesses are minor. There were no p-values reported for data presented in Figure S1D and no indication of how variable measurements were. In addition, the discussion of stochasticity was not integrated optimally with relevant literature.

      We thank the reviewer for noting these points. The statistical tests have now been included for Figure S1D (now Figure S1F). We have amplified the discussion about stochasticity, to include more reference to the literature and to make clear also the distinction with transcription bursting (page 19, 20).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have an elegant series of manipulations that provide strong evidence for their hypotheses and conclusions. Their exploitation of a unique biological system amenable to imaging in the larval salivary gland is well-considered and well-performed. Most of the conclusions are supported by the data. I only have the concerns below.

      (1) One of the main findings is the composition of Notch nuclear complexes and their interactions within a 'hub'. Yet most of the data showing hubs focus on labeling one protein component (+the locus or transcription), but multi-color imaging is rarely used to show how CSL-Mam, Mam-Med... protein signals coalescence to form a hub. Given the powerful tool developed, it would be important to show these multi-state hubs. Related to this, if the authors expect that hubs are formed independently of transcription or Notch pathway activation, do the authors see clustering at other non-specific loci in the nucleus? If not, can the authors comment on why they think that is the case? If so, do they demonstrate consistent residence time profiles with the tracked E(spl) locus?

      We apologise that it was not evident from the data shown that the proteins co-localize. First we stress that all the experiments are multicolor and most rely on very powerful methods to measure co-recruitment at a chromosomal locus- something that is very rarely achieved by others studying hubs. Second, we have in all cases confirmed that the proteins do colocalize. We have modified the diagram of our analysis pipeline to make more clear that this relies on multi-colour imaging, and adjusted all the figure labels to indicate the position of E(spl)-C. We have also added panels to new supplementary Figure S1C with examples of the co-localization between CSL and Mam and a plot confirming their levels of recruitment are correlated across multiple nuclei.

      We would like to clarify that our data show that the hubs do require Notch activation for their establishment. Other regions of enrichment are detected in Notch-ON conditions, but these are less prominent and, with no independent method for identifying them, can’t be compared between nuclei. In SPT experiments, other clusters with consistent residence are detected as reported in our recent paper which expanded on the SPT data (Baloul et al, 2023). We also detect co-localizations and “hubs” in other tissues, but those analyses are ongoing and beyond the scope of this paper.

      (2) The authors convincingly show that Notch hub complexes exhibit a memory. While the data showing rapid hub reformation upon Notch withdrawal are solid and convincing (Figure 5, in particular, F), the claim that this memory fosters rapid transcriptional reactivation is less clear. Yet in order to invoke transcriptional memory, it's necessary to solidify this transcriptional response angle. The authors should consider quantifying the changes in transcription activity (at the TS and not in the cytoplasm as currently shown), as well as the timing of transcriptional reactivation (with the MS2 system or smFISH). Manipulating the duration of the activation and dark recovery periods could help to draw a better correlation between the timing of hub reformation and that of transcriptional response and would also help determine how persistent this phenomenon is.

      We thank the reviewer for these suggestions. We have carried out several new experiments to probe further the persistence of memory and to show the effects on transcription when Notch is inactivated/reactivated. First, we have extended the time period for Notch inactivation by temperature control and show that the CSL hub persists even at 24 hours and that no transcription from the target E(spl)m3 is detected –neither at the transcription start-site nor in the cytoplasm. Second, we have extended the Notch OFF time period to 24 hours using the optogenetic approach and show that transcription is robustly reinitiated in preactivated nuclei when Notch is re-activated with 30 mins light treatment while little if any E(spl)m3 transcription is detected in naïve nuclei with the same treatment. These new data are included in new Figure 5 C-F and see page 13-14. Both these new experiments substantiate the model that the nuclei retain transcriptional memory.

      (3) The manuscript ends with the finding that the presence of a Mam hub does not always correlate with transcription. They conclude that transcription is initially stochastic. The authors find this surprising and even state that this could not be observed without their in vivo live imaging approaches. I don't understand why this result is surprising or unexpected, as we now know that transcription is generally a stochastic process and that most (if not all) loci are transcribed in a bursting manner. The fact that E(spl)-C locus is bursty is already obvious from the smFISH data. The fact that active nascent transcription does not correlate with local TF hubs was already observed in early Drosophila embryos (with Zelda hubs and two MS2 reporters, hb-MS2, sna-MS2). If, in spite of the inherent stochasticity of transcription (bursting), the data are surprising for other reasons, the authors should explain it better.

      We apologise that we had not made clear the reasons why the results were unexpected. We have substantially rewritten this section, and the discussion section, to clarify. We have also moderated the language used to better reflect the overall context of our results. We briefly summarise here. As the reviewer correctly states, it is well known that transcription is inherently bursty. Indeed the MS2 transcription profiles in “ON” nuclei are bursty, which likely reflects the switching of the promoter. However, in other contexts where we have monitored transcription although it is bursty it has nevertheless been initiated synchronously in response to Notch in all nuclei in a manner that was fully penetrant. What we observe in our current conditions, is that some nuclei never initiate transcription over the time-course of our experiments (2-3 hours), and those that are ON rarely switch off. This implies that there is another rate-limiting step. Supplying a second signal can modulate this so that it occurs with much higher frequency/penetrance. We consider this to be a second tier of regulation above the fundamental transcriptional bursting.

      The fact that Mam is recruited in all nuclei, whether or not they are actively transcribing was surprising because recruitment of the activation complex has been considered as the limiting step. This is somewhat different from Zelda, which is thought to be permissive and needed at an early step to prime genes for later activation rather than to be the last step needed to fire transcription. We note also that we are not monitoring the position of the hub with respect to the promoter, as in the Zelda experiments (Zelda hubs may still persist, but they are not overlapping with the nascent RNA), we are monitoring the presence or absence of Mam hub in proximity to a genomic region.

      Minor suggestions:

      (1) The genotypes of the samples should be indicated in the figure legends.

      We thank the reviewer for this suggestion. We have provided a table (new Table S3) where all of the genetic combinations are provided in detail for each figure. We considered that this approach would be preferable because it would be quite cumbersome to have the genotypes in each legend as they would become very long and repetitive.

      (2) While the schematic Fig1A explains how the locus is detected, the presence of ParS/ParB is never indicated in subsequent panels and Figure. I assume that all panels depicting enrichment profiles, use a given radius from the ParS/ParB dot to determine the zero of the x-axis (grey zone). This should be clearly stated in all panels/figure legends concerned.

      We apologies if this was not made explicit. Yes, all panels depicting enrichment profiles, use immunofluorescence signal from ParA/ParB recruitment to determine the zero of the x-axis. We have now marked this more clearly In all figures (grey bar, grey shading or labelled 0). All images where the locus is indicated by an arrowhead, by a coloured bar above the intensity plots or by grey shading in the graphs have been captured with dual colour and the signal from ParA/B recruitment used to define its location. This is now clearly stated in the analysis methods and in the legend. We have also modified the diagram in new supplementary Figure S1B, showing our analysis pipeline, to make that more explicit.

      (3) FRAP/SPT experiments: the author should provide more details. How many traces? Are traces showing bleaching removed?

      P7: does the statement ' The residences are likely an underestimation because bleaching and other technical limitations also affect track durations' imply that traces showing bleaching have not been removed from the analysis?

      The authors could justify the choice of the model for fitting FRAP/Spt experiments and be cautious about their interpretation. For example, interpreting a kinetic behavior as a DNA-specific binding event can be accurate, only if backed up with measurements with a mutant version of the DNA binding domain.

      We apologise if some of this information was not evident. The number of trajectories is provided in new Figure S1F, which indicates the number of trajectories analyzed for each condition in Figure 1.

      We have now added also the numbers of trajectories analyzed for the ring experiments.

      The comments on page 7 about bleaching refer to the technical limitations of the SPT approach. However, as bleached particles cannot be distinguished from those that leave the plane of imaging, they have not been filtered or removed. We have not sought to make claims about absolute residence times for that reason. Rather the point is to make a comparison between the different molecules. As the same fluorescent ligand and imaging conditions are used in all the experiments, all the samples are equivalently affected by bleaching. We subdivide trajectories according to their properties and infer that those which are essentially stationary are bound to chromatin, as is common practice in the field. We note that we have previously shown that a DNA binding mutant of CSL does not produce a hub at E(spl)-C in Notch-ON conditions and has a markedly more rapid recovery in FRAP experiments (Gomez-Lamarca et al, 2018) consistent with the slow recovery being related to DNA binding. This point has been added to the text (page 8).

      (4) The authors should quantify their RNAi efficiency for Hairless-RNAi, Med13-RNAi, white-RNAi, yellow-RNAi, CBP-RNAi, and CDK8-RNAi.

      We thank the reviewer for this comment. We have made sure that we are using well validated RNAis in all our experiments and have included the references in Table S2 where they have been used. We have now evaluated the knock-down in the precise conditions used in our experiments by quantitative RT-PCR and added those data, which show efficient knock-down is occurring, to new Supplementary Figure S1D and Figure S3J. We note also that the RNAi experiments are complemented by experiments inhibiting the complexes with specific drugs and that these yield similar results.

      (5) Figure 3 A: could the author show that transcription is indeed inhibited upon triptolide treatment with smFISH (with for example m3 probes)? Why not use alpha-amanitin?

      We thank the reviewer for this suggestion. We had omitted the smFISH data from this experiment in error. These data have now been added to new Supplementary Figure S3A and clearly show that transcription is inhibited following 1 hour exposure to triptolide. Triptolide is a very fast acting and very efficient inhibitor of transcription that acts at a very early step in transcription initiation. In our experience it is much more efficient than alpha-amanitin and is now the inhibitor of choice in many transcription studies.

      (6) Figure 4 typo: panel B should be D and vice versa. Accessibility panels are referred to as Figure 4D, D' in the text but presented as panel B in the Figure.

      We thank the reviewer for noting this mistake, it is now changed in the main text.

      (7) The authors must add their optogenetic manipulation protocol to their methods section.

      The method is described in detail in a recently published paper that reports its design and use. We have now also added a section explaining the paradigm in the methods (Page 31) as requested.

      (8) Figure 3G needs a Y-axis label.

      Our apologies, this has now been added.

      (9) The authors should note why there was a change of control in Figure 3D compared to 3E and G (yellow RNAi vs white RNAi).

      This is a pragmatic choice that relates to the chromosomal site of the RNAis being tested. Controls were chosen according to the chromosome that carries the UAS-RNAi: for the second chromosome this was yellow RNAi and for the third white RNAi. This is explained in the methods.

      (10) Figure 1 would benefit from a diagram describing the genomic structure of the E(spl) locus and the relative position of the labelled locus within it.

      We thank the reviewer for this suggestion and have added a diagram to Supplementary Figure S1A .

      Reviewer #2 (Recommendations For The Authors):

      Minor criticisms and typos:

      Pet peeve: in some of the figure panels they are labeled Notch ON or OFF, but in others they are not, albeit that info is included in the figure legend. For the ease of the reader/reviewer, would it be possible to label all relevant figure panels either Notch ON or OFF for clarity?

      We thank the reviewer for this suggestion and have modified the figures accordingly.

      Page 7, top. "In comparison to their average distribution across the nucleus, both CSL and Mam trajectories were significantly enriched in a region of approximately 0.5 μm around the target locus in Notch-ON conditions, reflecting robust Notch dependant recruitment to this gene complex." Are the authors referring to Figure 1D here?

      Thank you, this figure call-out has been added in the text.

      Page 9. "...reported to interact with p300 and other factors (Figure S2B)." I believe the authors mean Figure S2C and not S2B.

      Thank you, this has been corrected in the text.

      Page 9. There is no Figure S2D.

      Apologies, this was referring to Figure S1D, and is now corrected in the text.

      Page 11: "...were at very reduced levels in nuclei co-expressing MamDN (Figure 4B).." Should be Figure 4CD.

      Thank you, this has been corrected in the text.

      Page 12: "...which was maintained in the presence of MamDN (Figure 4D, D')." Should be Figure 4B.

      Thank you, this has been corrected in the text.

      Reviewer #3 (Recommendations For The Authors):

      In the Results section on Hub, the paragraph starting with "Third, we reasoned . ." the callout to Figure S2D should be Fig S1D.

      Thank you, this has been corrected in the text

      Figures: The font size in the Figures is so small that most words and numbers cannot be read on a printout. One has to go to the electronic version and increase the size to read it. This reviewer found that inconvenient and often annoying.

      We apologise for this oversight, the font size has now been adjusted on all the graphs etc.

      Figure legends: the legends are terse and in some cases leave explanations to the imagination (e.g. "px" in Figure 2E). It would be useful to go through them and make sure those who are not a Drosophila Notch person and not a transcription biochemist can make sense of them.

      Our apologies for the lack of clarity in the legends. We have gone over them to make them more accessible and less succinct.

    1. Author Response

      We are very pleased to hear the overall positive views and constructive criticisms of eLife Editors and Reviewers on our work. In particular, we appreciate their comments highlighting the value of our new pipeline for high-throughput quantification of fly embryonic movement and the positive views of reviewers and editors that our data on the roles of miR-2b-1 in embryonic movement are well supported.

      Regarding Reviewer 1, we thank them for their positive comments that our work is experimentally sound and well-written, their kind words on the value of our new embryonic movement pipeline, and their overall appreciation of the quality, scope, and significance of our work. In a revised version of the manuscript we will consider discussing and addressing some of the interesting points raised by Rev1.

      Turning to the comments by Rev2, we are grateful to them for their recognition of the novelty of our miRNA findings and appreciation of the utility of our novel quantitative pipeline for assessing embryonic movement. Nonetheless, we politely – but strongly – disagree with their suggestion that the findings are inflated by our language. For example, they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. It is not inflation. In connection to other comments, in a revised manuscript we will propose a different name for the gene here described as Janus to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work provides new mechanistic insights into the competitive inhibition in the mammalian P2X7 receptors using structural and functional approaches. The authors solved the structure of panda (pd) P2X7 in the presence of the classical competitive antagonists PPNDS and PPADS. They find that both drugs bind to the orthosteric site employed by the physiological agonist ATP. However, owing to the presence of a single phosphate group, they prevent movements in the flipper domain required for channel opening. The authors performed structure-based mutational analysis together with electrophysiological characterization to understand the subtype-specific binding of these drugs. It is known from previous studies that P2X1 and P2X3 are more sensitive to these drugs as compared to P2X7, hence, the residues adjacent to the ATP binding site in pdP2X7 were mutated to those present in P2X1. They observed that mutations of Q143, I214, and Q248 into lysine (hP2X1) increased the P2X7 sensitivity to PPNDS, whereas in P2X1, mutations of these lysines to alanine reduced sensitivity to PPNDS, suggesting that these key residues contribute to the subunit-specific sensitivity to these drugs. Similar experiments were done in hP2X3 to demonstrate its higher sensitivity to PPNDS. This preprint provides a useful framework for developing subtype-specific drugs for the family of P2X receptor channels, an area that is currently relatively unexplored.

      We appreciate the time and effort Reviewer #1 devoted to this review, and we have addressed the specific comments below.

      (1) Why was the crystallization construct of panda P2X7 used for structural studies instead of rat P2X7 with the cytoplasmic ballast which is a more complete receptor that is closely related to the human receptor? Can the authors provide a justification for this choice?

      We appreciate this comment. We did try to express the rat P2X7 receptor in its full-length form based on a previous report (Cell 2019, PMID: 31587896), but the expression of the receptor was not successful for an unknown reason. Instead, we employed a truncated construct of panda P2X7 based on the findings described another previous report (eLife 2016, PMID: 27935479). This truncated construct also possesses ATP-dependent channel activity (eLife 2016, PMID: 27935479). Thus, we understand that the full-length P2X7 construct would be preferable, particularly for addressing the function of the cytoplasmic domain; however, the main focus of this study was on PPNDS/PPNADS recognition and the associated structural changes in the ATP binding pocket, which we believe are less likely to be severely affected by truncation of the cytoplasmic domain. In support of this expectation, our mutational analyses are consistent with the structures in this study. Therefore, we believe that the use of the truncation construct in this study is justified.

      (2) Was there a good reason why hP2X1 and hP2X3 currents were recorded in perforated patches, whereas pdP2X7 currents were recorded using the whole-cell configuration? It seems that the extent of rundown is less of a problem with perforated patch recordings. Can the authors comment and perhaps provide a justification? It would also be good to present data for repeated applications of ATP alone using protocols similar to those for testing antagonists so the reader can better appreciate the extent of run down with different recording configurations for the different receptors.

      We thank the reviewer for bringing up this point. The whole-cell configuration is the most commonly used method in patch-clamp experiments; therefore, we used this method to record the current of pdP2X7 (Author response image 1). However, the whole-cell configuration is not suitable for all experiments; for example, the currents of P2X1 and P2X3 recorded by this method show a severe "rundown" effect. The "rundown" effect prevents accurate calculation of the inhibition rate of the antagonist, and to obtain more accurate results, we used perforated patches to record the currents of hP2X1 and hP2X3.

      Author response image 1.

      Representative current traces of pdP2X7, hP2X3, and hP2X1 after repeated applications of ATP. The pdP2X7 currents were recorded using the whole-cell configuration, and the hP2X1 and hP2X3 currents were recorded using perforated patches.

      (3) The data in Fig. S1, panel A shows multiple examples where the currents activated by ATP after removal of the antagonist are considerably smaller than the initial ATP application. Is this due to rundown or incomplete antagonist unbinding? It is interesting that this wasn't observed with hP2X1 and hP2X3 even though they have a higher affinity for the antagonist. Showing examples of rundown without antagonist application would help to distinguish these distinct phenomena and it would be good for the authors to comment on this in the text. It is also curious why a previous study on pdP2X7 did not seem to have problems with rundown (see Karasawa and Kawate. eLife, 2016).

      We thank the reviewer for bringing up this point. We believe that this difference may be the result of incomplete antagonist unbinding. A similar phenomenon has been observed in previous studies of pdP2X7 (eLife 2016, PMID: 27935479). In the previous experiment, the currents activated by ATP after removal of the antagonist A740003 did not return to the initial value upon ATP application, whereas activation by ATP after removal of the antagonist GW791343 immediately restored the initial value upon ATP application (Fig. 1C of eLife 2016, PMID: 27935479). This may be because different inhibitors dissociate differently from pdP2X7. In our experiments, we assumed that PPNDS/PPADS was not completely dissociated from P2X7 even after 20 min of elution. The activation of P2X7 by ATP without antagonists showed no rundown effect (Author response image 1); therefore, we calculated the inhibition rate of the antagonist according to the precontrol.

      (4) The written presentation could be improved as there are many instances where the writing lacks clarity and the reader has to guess what the authors wish to communicate.

      To address this comment, we made changes to the text, particularly by following the

      Recommendations for The Authors

      Reviewer #1 (Recommendations For The Authors):

      (1) The way the manuscript is written could be greatly improved. There are many confusing sections where the reader has to guess what the authors wish to convey. For example, on page 9 "In addition, the mutation of Val173 to aspartate, as observed in pdP2X7, significantly decreased the sensitivity to PPNDS (Fig. 6B)." It appears from this sentence that Asp is present in P2X7, which is incorrect, please rephrase. There are many more examples of confusing sentences that need to be carefully edited to improve comprehension.

      To address this comment, we extensively modified the text to avoid this kind of misunderstanding. Please see the manuscript file with the track changes.

      (2) Please use either a 1-letter or 3-letter code for amino acid residues throughout the manuscript to maintain uniformity.

      We made this correction throughout the revised manuscript.

      (3) In Figure 1 on the right side, including the nearby density and side chains for interacting residues of PPNDS and PPADS would give more information and reliability for the density of the drugs.

      We appreciate this comment. The corresponding information is shown in Fig. S7.

      (4) Typo: Figure S1, E, and F panels - please correct the y-axis label to Inhibition.

      We corrected the typo in Fig. S1.

      (5) Please rewrite the legends for Fig. S3 and S5. They are confusing. The figure shows 3D classification using Relion, however, the legend suggests it was done using Cryosparc. Please clarify.

      We apologize for the confusion. Before applying C3 symmetry, all steps including 3D classification were performed in Relion 3.1. With C3 symmetry, we performed further refinement using Cryosparc v4.2.1 by non-uniform refinement. We have corrected the figure legends accordingly.

      (6) For Fig. S3 and S5 increase the resolution and size of representative micrographs, and also please provide scale bars.

      We have corrected Figures S3 and S5 accordingly.

      (7) Please add the 3D classification protocol performed in Relion/Cryosparc in the methods section as well.

      We added the corresponding description to the revised manuscript (Lines 9-14, Page 16).

      (8) In Table S1, under the initial model the authors state 'this study' when they should report the use of 5U1L according to the methods section.

      We corrected Table S1 in accordance with this comment.

      (9) The authors should consider combining the raw data shown in Figure S1 in Figure 6 as it provides stronger support for the conclusions than the bar graphs shown in Figure 6B.

      We appreciate the comment and fully understand the intention of Reviewer #1. Nevertheless, we would like to keep Figure S1, since it was also mentioned earlier together with Figure 1. In addition, if we combine Figure S1 with Figure 6, the result would be too large to present as a single figure.

      (10) In Figure 6A, please provide colored labels for both P2X7 and P2X1 to aid comprehension of the structural models.

      Based on this comment, we corrected the labels in Figure 6.

      (11) In the discussion, the authors write about comparisons with the docking study by Huo et al. JBC, 2018. Can they show the superimposition of their EM model with the previous studies' docking model in a supplementary figure for more clarity?

      We appreciate the constructive comments. However, unfortunately, the docking model in the previous study (JBC 2018, PMID: 29997254) is not available, so it is not possible to show the superimposition.

      Reviewer #2 (Public Review):

      Summary:

      P2X receptors play pivotal roles in physiological processes such as neurotransmission and inflammation, making them promising drug targets. This study, through cryo-EM and functional experiments, reveals the structural basis of the competitive inhibition of the PPNDS and PPADS on mammalian P2X7 receptors. Key findings include the identification of the orthosteric site for these antagonists, the revelation of how PPADS/PPNDS binding impedes channel-activating conformational changes, and the pinpointing of specific residues in P2X1 and P2X3 subtypes that determine their heightened sensitivity to these antagonists. These insights present a comprehensive understanding that could guide the development of improved drugs targeting P2X receptors. This work will be a valuable addition to the field.

      Strengths and weaknesses:

      The combination of structural experiments and mutagenesis analyses offers a deeper understanding of the mechanism. While the inclusion of MD simulation is appreciated, providing more insights from the simulation might further strengthen this already compelling story.”

      We appreciate the time and effort Reviewer #2 devoted to this review, and we have addressed the specific comments below.

      Reviewer #2 (Recommendations For The Authors):

      (1) On page 3, the sentence "ATP analogs are the most competitive inhibitors of P2X receptors but are typically unsuitable due to a lack of high specificity in vivo," might need additional context. Could the authors clarify if they are referring to the unsuitability of ATP analogs for medical applications?

      To address this comment, we have rewritten the sentence as follows (Lines 13-16, Page 3):

      ATP analogs are most common among competitive inhibitors for P2X receptors; however, they are generally unsuitable for in vivo applications due to their relatively low specificity, which may result in off-target toxicity. This issue arises because the human body contains numerous ATP-binding proteins.

      (2) Fig. S1. I am curious why, for P2X7, the ATP-only current after removal of PPNDS/PPADS does not recover and become larger than the current in the presence of PPNDS/PPADS? Such behavior was not as pronounced in P2X1. Does that suggest PPNDS/PPADS might remain bound and can not be removed when the P2X7 channel is closed?

      We thank the reviewer for bringing up this point. We believe that this difference may be the result of incomplete antagonist unbinding. A similar phenomenon has been observed in previous studies of pdP2X7 (eLife 2016, PMID: 27935479). In the previous experiment, the currents activated by ATP after removal of the antagonist A740003 did not return to the initial value upon ATP application, whereas activation by ATP after removal of the antagonist GW791343 immediately restored the initial value upon ATP application (Fig. 1C of eLife 2016, PMID: 27935479). We strongly agree with the reviewer that this may be due to the difficulty of dissociating the antagonist from pdP2X7.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Weaknesses are the absence of correlation between the results from the animal studies and human pancreatic cancers.

      Author response: We appreciate the reviewer’s attention to the importance of human pancreatic cancer studies. In a previous study (D’Amico et al. Genes & Development 2018 doi: 10.1101/gad.311852.118), we evaluated the expression of STAT3 in human pancreatic tissue microarrays and data from the Human Protein Atlas. Mutations in Stat3 are infrequent in human pancreatic cancers, however there is a trend of decreased STAT3 activity in poorly differentiated carcinomas.

      In the current study, STAT3 and SMAD4 gene signature scores (computed from KO KPC cells) were aligned with human pancreatic ductal adenocarcinoma samples from the TCGA cohort, and statistical analyses supported the selective antagonism of STAT3 and SMAD4 (Fig 4D, Fig 4E).

      The complex process of EMT is difficult to characterize rigorously in human cancers. Mouse models offer an opportunity to study the relationships between cancer phenotypes and genetic alterations.

      Reviewer #2 (Public Review):

      [...] While correlations are strong, the study would benefit from additional cause-and-effect type experiments. It would also be beneficial to better tie together the first and second parts of the paper.

      Author response: We understand the Reviewer’s interest in additional experiments that could further elucidate mechanisms that drive EMT and/or KRAS dependency in relation to STAT3 and TGF-beta antagonism. We previously investigated the development of mutant KRAS knockout tumors (Ischenko et al. Nature Communications 2021 doi:10.1038/s41467-021-21736) to find loss of KRAS promotes EMT, similar to loss of STAT3. Additional experiments are underway but are outside the scope of the current study.

      The first part of the paper is mechanistic and used KRAS-transformed mouse embryo fibroblasts to perform in vitro studies with foci formation. The cell-based foci formation assay has been shown to best evaluate malignant transformation and oncogenic potential. In the second part we transitioned to epithelial cells and pancreatic ductal adenocarcinomas to combine mechanistic relationships with genetic models.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      I would like to express my appreciation for the authors' dedication to revising the manuscript. It is evident that they have thoughtfully addressed numerous concerns I previously raised, significantly contributing to the overall improvement of the manuscript.

      Response: We appreciate the reviewers’ recognition of our efforts in revising the manuscript.

      My primary concern regarding the authors' framing of their findings within the realm of habitual and goal-directed action control persists. I will try explain my point of view and perhaps clarify my concerns. While acknowledging the historical tendency to equate procedural learning with habits, I believe a consensus has gradually emerged among scientists, recognizing a meaningful distinction between habits and skills or procedural learning. I think this distinction is crucial for a comprehensive understanding of human action control. While these constructs share similarities, they should not be used interchangeably. Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses).

      Response: We would like to clarify that, contrary to the reviewer’s assertion of a scientific consensus on this matter, the discussion surrounding the similarities and differences between habits and skills remains an ongoing and unresolved topic of interest among scientists (Balleine and Dezfouli, 2019; Du and Haith, 2023; Graybiel and Grafton, 2015; Haith and Krakauer, 2018; Hardwick et al., 2019; Kruglanski and Szumowska, 2020; Robbins and Costa, 2017). We absolutely agree with the reviewer that “Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses)”. But so do habits. Some researchers also highlight the intentional/goal-directed nature of habits (e.g., Du and Haith, 2023, “Habits are not automatic” (preprint) or Kruglanski and Szumowska, 2020, “Habitual behavior is goal-driven”: “definitions of habits that include goal independence as a foundational attribute of habits are begging the question; they effectively define away, and hence dispose of, the issue of whether habits are goal-driven (p 1258).” Therefore, there is no clear consensus concerning the concept of habit.

      While we acknowledge the meaningful distinctions between habits and skills, we also recognize a substantial body of literature supporting the overlap between these concepts (cited in our manuscript), particularly at the neural level. The literature clearly indicates that both habits and skills are mediated by subcortical circuits, with a progressive disengagement of cognitive control hubs in frontal and cingulate cortices as repetition evolves. We do not use these concepts interchangeably. Instead, we simply present evidence supporting the assertion that our trained app sequences meet several criteria for their habitual nature.

      Our choice of Balleine and Dezfouli (2018)'s criteria stemmed from the comprehensive nature of their definitions, which effectively synthesized insights from various researchers (Mazar and Wood, 2018; Verplanken et al., 1998; Wood, 2017, etc). Importantly, their list highlights the positive features of habits that were previously overlooked. However, these authors still included a controversial criterion ("habits as insensitive to changes in their relationship to their individual consequences and the value of those consequences"), even though they acknowledged the problems of using outcome devaluation methods and of relying on a null-effect. According to Kruglanski and Szumowska (2020), this criterion is highly problematic as “If, by definition, habits are goalindependent, then any behavior found to be goal-dependent could not be a habit on sheer logical grounds” (p. 1257). In their definition, “habitual behavior is sensitive to the value of the reward (i.e., the goal) it is expected to mediate and is sensitive to the expectancy of goal attainment (i.e., obtainment of the reward via the behavior, p.1265). In fact, some recent analyses of habitual behavior are not using devaluation or revaluation as a criterion (Du and Haith, 2023). This article, for example, ascertains habits using different criteria and provides supporting evidence for trained action sequences being understood as skills, with both goal-directed and habitual components.

      In the discussion of our manuscript, we explicitly acknowledge that the app sequences can be considered habitual or goal-directed in nature and that this terminology does not alter the fact that our overtrained sequences exhibit clear habitual features.

      Watson et al. (2022) aptly detailed my concerns in the following statements: "Defining habits as fluid and quickly deployed movement sequences overlaps with definitions of skills and procedural learning, which are seen by associative learning theorists as different behaviors and fields of research, distinct from habits."

      "...the risk of calling any fluid behavioral repertoire 'habit' is that clarity on what exactly is under investigation and what associative structure underpins the behavior may be lost." I strongly encourage the authors, at the very least, to consider Watson et al.'s (2022) suggestion: "Clearer terminology as to the type of habit under investigation may be required by researchers to ensure that others can assess at a glance what exactly is under investigation (e.g., devaluationinsensitive habits vs. procedural habits)", and to refine their terminology accordingly (to make this distinction clear). I believe adopting clearer terminology in these respects would enhance the positioning of this work within the relevant knowledge landscape and facilitate future investigations in the field.

      Response: We would like to highlight that we have indeed followed Watson et al (2022)’s recommendations on focusing on other features/criteria of habits at the expense of the outcome devaluation/contingency degradation paradigm, which has been more controversial in the human literature. Our manuscript clearly aligns with Watson et al. (2022) ‘s recommendations: “there are many other features of habits that are not captured by the key metrics from outcome devaluation/contingency degradation paradigms such as the speed at which actions are performed and the refined and invariant characteristics of movement sequences (Balleine and Dezfouli, 2019). Attempts are being made to develop novel behavioral tasks that tap into these positive features of habits, and this should be encouraged as should be tasks that are not designed to assess whether that behavior is sensitive to outcome devaluation, but capture the definition of habits through other measures”.

      Regarding the authors' use of Balleine and Dezfouli's (2018) criteria to frame recorded behavior as habitual, as well as to acknowledgment the study's limitations, it's important to highlight that while the authors labelled the fourth criterion (which they were not fulfilling) as "resistance to devaluation," Balleine and Dezfouli (2018) define it as "insensitive to changes in their relationship to their individual consequences and the value of those consequences." In my understanding, this definition is potentially aligned with the authors' re-evaluation test, namely, it is conceptually adequate for evaluating the fourth criterion (which is the most accepted in the field and probably the one that differentiate habits from skills). Notably, during this test, participants exhibited goaldirected behavior.

      The authors characterized this test as possibly assessing arbitration between goal-directed and habitual behavior, stating that participants in both groups "demonstrated the ability to arbitrate between prior automatic actions and new goal-directed ones." In my perspective, there is no justification for calling it a test of arbitration. Notably, the authors inferred that participants were habitual before the test based on some criteria, but then transitioned to goal-directed behavior based on a different criterion. While I agree with the authors' comment that: "Whether the initiation of the trained motor sequences in experiment 3 (arbitration) is underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1)." they implicitly assert a shift from habit to goal-directed behavior without providing evidence that relies on the same probed mechanism. Therefore, I think it would be more cautious to refer to this test as solely an outcome revaluation test. Again, the results of this test, if anything, provide evidence that the fourth criterion was tested but not met, suggesting participants have not become habitual (or at least undermines this option).

      Response: In our previously revised manuscript, we duly acknowledged that the conventional (perhaps nowadays considered outdated) goal devaluation criterion was not met, primarily due to constraints in designing the second part of the study. We did cite evidence from another similar study that had used devaluation app-trained action sequences to demonstrate habitual qualities (but the reviewer ignored this).

      The reviewer points out that we did use a manipulation of goal revaluation in one of the follow-up tests conducted (although this was not a conventional goal revaluation test inasmuch that it was conducted in a novel context). In this test, please note that we used 2 manipulations: monetary and physical effort. Although we did show that subjects, including OCD patients, were apparently goaldirected in the monetary reward manipulation, this was not so clear when goal re-evaluation involved the physical effort expended. In this effort manipulation, participants were less goaloriented and OCD patients preferred to perform the longer, familiar, to the shorter, novel sequence, thus exhibiting significantly greater habitual tendencies, as compared to controls. Hence, we cannot decisively conclude that the action sequence is goal-directed as the reviewer is arguing. In fact, the evidence is equivocal and may reflect both habitual and goal-directed qualities in the performance of this sequence, consistent with recent interpretations of skilled/habitual sequences (Du and Haith, 2023). Relying solely on this partially met criterion to conclude that the app-trained sequences are goal-directed, and therefore not habitual, would be an inaccurate assessment for several reasons: 1) the action sequences did satisfy all other criteria for being habitual; 2) this approach would rest on a problematic foundation for defining habits, as emphasized by Kruglanski & Szumowska (2020); and 3) it would succumb to the pitfall of subscribing to a zero-sum game perspective, as cautioned by various researchers, including the review by Watson et al. (2022) cited by the referee, thus oversimplifying the nuanced nature of human behavior.

      While we have previously complied with the reviewer’s suggestion on relabelling our follow-up test as a “revaluation test” instead of an “arbitration test”, we have now explicitly removed all mentions of the term “arbitration” (which seems to raise concerns) throughout the manuscript. As the reviewer has suggested, we now use a more refined terminology by explicitly referring to the measured behavior as "procedural habits", as he/she suggested. We have also extensively revised the discussion section of our manuscript to incorporate the reviewer’s viewpoint. We hope that these adjustments enhance the clarity and accuracy of our manuscript, addressing the concerns raised during this review process.

      In essence, this is an ontological and semantic matter, that does not alter our findings in any way. Whether the sequences are consider habitual or goal directed, does not change our findings that 1) Both groups displayed equivalent procedural learning and automaticity attainment; 2) OCD patients exhibit greater subjective habitual tendencies via self-reported questionnaires; 3) Patients who had elevated compulsivity and habitual self-reported tendencies engaged significantly more with the motor habit-training app, practiced more and reported symptom relief at the end of the study; 4) these particular patients also show an augmented inclination to attribute higher intrinsic value to familiar actions, a possible mechanism underlying compulsions.

      Reviewer #2 (Recommendations For The Authors):

      A few more small comments (with reference to the point numbers indicated in the rebuttal):

      (14) I am not entirely sure why the suggested analysis is deemed impractical (i.e., why it cannot be performed by "pretending" participants received the points they should have received according to their performance). This can further support (or undermine) the idea of effect of reward on performance rather than just performance on performance.

      Response: We have now conducted this analysis, generating scores for each trial of practices after day 20, when participants no longer gained points for their performance. This analysis assesses whether participants trial-wise behavioral changes exhibit a similar pattern following simulated relative increases or decrease in scores, as if they had been receiving points at this stage. Note that this analysis has fewer trials available, around 50% less on average.

      Before presenting our results, we wish to emphasize the importance of distinguishing between the effects of performance on performance and the effects of reward on performance. In response to a reviewer's suggestion, we assessed the former in the first revision of our manuscript. We normalized the movement time variable and evaluated how normalized behavioral changes responded to score increments and decrements. The results from the original analyses were consistent with those from the normalized data.

      Regarding the phase where participants no longer received scores, we believe this phase primarily helps us understand the impact of 'predicted' or 'learned' rewards on performance. Once participants have learned the simple association between faster performance and larger scores, they can be expected to continue exhibiting the reward sensitivity effects described in our main analysis. We consider it is not feasible to assess the effects of performance on performance during the reward removal phase, which occurs after 20 days. Therefore, the following results pertain to how the learned associations between faster movement times and scores persist in influencing behavior, even when explicit scores are no longer displayed on the screen.

      Results: The main results of the effect of reward on behavioral changes persist, supporting that relative increases or decreases in scores (real or imagined/inferred) modulate behavioral adaptations trial-by-trial in a consistent manner across both cohorts. The direction of the effects of reward is the same as in the main analyses presented in the manuscript: larger mean behavioral changes (smaller std) following ∆R- . First, concerning changes in “normalized” movement time (MT) trial-by-trial, we conducted a 2 x 2 factorial analysis of the centroid of the Gaussian distributions with the same factors Reward, Group and Bin. This analysis demonstrated a significant main effect of Reward (P = 2e-16), but not of Group (P = 0.974) or Bin (P = 0.281). There were no significant interactions between factors. The main Reward effect can be observed in the top panel of the figure below. The same analysis applied to the spread (std) of the Gaussian distributions revealed a significant main effect of Reward (P = 0.000213), with no additional main effects or interactions.

      Author response image 1.

      Next, conducting the same 2 x 2 factorial analyses on the centroid and spread of the Gaussian distributions fitted to the Consistency data, we also obtained a robust significant main effect of Reward. For the centroid variable, we obtained a significant main effect of Reward (P = 0.0109) and Group (P = 0.0294), while Bin and the factor interactions were non-significant. See the top panel of the figure below.

      On the other hand, Reward also modulated significantly the spread of the Gaussian distributions fitted to the Consistency data, P = 0.00498. There were no additional significant main effects or interactions. See the bottom panel in the figure below.

      Note that here the factorial analysis was performed on the logarithmic transformation of the std.

      Author response image 2.

      (16) I find this result interesting and I think it might be worthwhile to include it in the paper.

      Response: We have now included this result in our revised manuscript (page 28)

      (18) I referred to this sentence: "The app preferred sequence was their preferred putative habitual sequence while the 'any 6' or 'any 3'-move sequences were the goal-seeking sequences." In my understanding, this implies one choice is habitual and another indicates goal-directedness.

      One last small comment:
In the Discussion it is stated: "Moreover, when faced with a choice between the familiar and a new, less effort-demanding sequence, the OCD group leaned toward the former, likely due to its inherent value. These insights align with the theory of goal-direction/habit imbalance in OCD (Gillan et al., 2016), underscoring the dominance of habits in particular settings where they might hold intrinsic value."

      This could equally be interpreted as goal-directed behavior, so I do not think there is conclusive support for this claim.

      Response: The choice of the familiar/trained sequence, as opposed to the 'any 6' or 'any 3'-move sequences cannot be explicitly considered goal-directed: firstly, because the app familiar sequences were associated with less monetary reward (in the any-6 condition), and secondly, because participants would clearly need more effort and time to perform them. Even though these were automatic, it would still be much easier and faster to simply tap one finger sequentially 6 times (any6) or 3 times (any-3). Therefore, the choice for the app-sequence would not be optimal/goaldirected. In this sense, that choice aligns with the current theory of goal-direction/habit imbalance of OCD. We found that OCD patients prefer to perform the trained app sequences in the physical effort manipulation (any-3 condition). While this, on one hand cannot be explicitly considered a goal-directed choice, we agree that there is another possible goal involved here, which links to the intrinsic value associated to the familiar sequence. In this sense the action could potentially be considered goal-directed. This highlights the difficulty of this concept of value and agrees with: 1) Hommel and Wiers (2017): “Human behavior is commonly not driven by one but by many overlapping motives . . . and actions are commonly embedded into larger-scale activities with multiple goals defined at different levels. As a consequence, even successful satiation of one goal or motive is unlikely to also eliminate all the others(p. 942) and 2) Kruglanski & Szumowska (2020)’s account that “habits that may be unwanted from the perspective of an outsider and hence “irrational” or purposeless, may be highly wanted from the perspective of the individual for whom a habit is functional in achieving some goal” (p. 1262) and therefore habits are goal-driven.

      References:

      Balleine BW, Dezfouli A. 2019. Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits. Front Psychol 10:2735. doi:10.3389/fpsyg.2019.02735

      Du Y, Haith A. 2023. Habits are not automatic. doi:10.31234/osf.io/gncsf Graybiel AM, Grafton ST. 2015. The Striatum: Where Skills and Habits Meet. Cold Spring Harb Perspect Biol 7:a021691. doi:10.1101/cshperspect.a021691

      Haith AM, Krakauer JW. 2018. The multiple effects of practice: skill, habit and reduced cognitive load. Current Opinion in Behavioral Sciences 20:196–201. doi:10.1016/j.cobeha.2018.01.015

      Hardwick RM, Forrence AD, Krakauer JW, Haith AM. 2019. Time-dependent competition between goal-directed and habitual response preparation. Nat Hum Behav 1–11. doi:10.1038/s41562019-0725-0

      Hommel B, Wiers RW. 2017. Towards a Unitary Approach to Human Action Control. Trends Cogn Sci 21:940–949. doi:10.1016/j.tics.2017.09.009

      Kruglanski AW, Szumowska E. 2020. Habitual Behavior Is Goal-Driven. Perspect Psychol Sci 15:1256– 1271. doi:10.1177/1745691620917676

      Mazar A, Wood W. 2018. Defining Habit in Psychology In: Verplanken B, editor. The Psychology of Habit: Theory, Mechanisms, Change, and Contexts. Cham: Springer International Publishing. pp. 13–29. doi:10.1007/978-3-319-97529-0_2

      Robbins TW, Costa RM. 2017. Habits. Current Biology 27:R1200–R1206. doi:10.1016/j.cub.2017.09.060

      Verplanken B, Aarts H, van Knippenberg A, Moonen A. 1998. Habit versus planned behaviour: a field experiment. Br J Soc Psychol 37 ( Pt 1):111–128. doi:10.1111/j.2044-8309.1998.tb01160.x

      Watson P, O’Callaghan C, Perkes I, Bradfield L, Turner K. 2022. Making habits measurable beyond what they are not: A focus on associative dual-process models. Neurosci Biobehav Rev 142:104869. doi:10.1016/j.neubiorev.2022.104869

      Wood W. 2017. Habit in Personality and Social Psychology. Pers Soc Psychol Rev 21:389–403. doi:10.1177/1088868317720362

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ need further interrogation. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors for the positive assessment of our manuscript. We have carefully considered the reviewers’ constructive and helpful comments and revised our manuscript accordingly. To address the question about the dissociable relationships between global and local BM processing, we have provided more evidence and additional analyses in this revised version.

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating differences in biological motion perception in participants with ADHD in comparison with controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated local and global (holistic) biological motion perception, the group, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention/impulsivity). As well as local global biological motion perception is reduced in ADHD participants. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not the controls. A path analysis in the ADHD data suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature and adds potentially also new behavioral markers for this clinical condition. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thank you for your positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper.

      We agree that the relationship between genetic factors and BM processing in ADHD needs more investigation, We have modified our statement in Discussion section as following:

      “Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19.” (lines 421 - 425),

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a relatively clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate your positive assessment of our work.

      Weaknesses:

      Except for the main analysis, it is unclear what the authors' specific predictions are regarding the three different tasks they employ. The three BM tasks are used to probe different processes underlying BM perception, but it is difficult to gather from the introduction why these three specific tasks were chosen and what predictions the authors have about the performance of the ADHD group in these tasks. Relatedly, the authors do not report whether (and if so, how) they corrected for multiple comparisons in their analyses. As the number of tests one should control for depends on the theoretical predictions (http://daniellakens.blogspot.com/2016/02/why-you-dont-need-to-adjust-you-alpha.html), both are necessary for the reader to assess the statistical validity of the results and any inferences drawn from them. The same is the case for the secondary analyses exploring relationships between the 3 individual BM tasks and social function measured by the social responsivity scale (SRS).

      We appreciate these constructive suggestions. In response, we have included a detailed description in the Introduction section explaining why we employed three different tasks and our predictions about the performance in ADHD:

      “Despite initial indications, a comprehensive investigation into BM perception in ADHD is warranted. We proposed that it is essential to deconstruct BM processing into its multiple components and motion features, since treating them as a single entity may lead to misleading or inconsistent findings31. To address this issue, we employed a carefully designed behavioral paradigm used in our previous study19, making slight adjustments to adapt for children. This paradigm comprises three tasks. Task 1 (BM-local) aimed to assess the ability to process local BM cues. Scrambled BM sequences were displayed and participants could use local BM cues to judge the facing direction of the scrambled walker. Task 2 (BM-global) tested the ability to process the global configuration cues of the BM walker. Local cues were uninformative, and participants used global BM cues to determine the presence of an intact walker. Task 3 (BM-general) tested the ability to process general BM cues (local + global cues). The stimulus sequences consisted of an intact walker and a mask containing similar target local cues, so participants could use general BM cues (local + global cues) to judge the facing direction of the walker.” (lines 116 - 130)

      “In Experiment 1, we examined three specific BM perception abilities in children with ADHD. As mentioned earlier, children with ADHD also show impaired social interaction, which implies atypical social cognition. Therefore, we speculated that children with ADHD performed worse in the three tasks compared to TD children.” (lines 131 - 134)

      Additionally, we have reported the p values corrected for multiple comparisons (false discovery rate, FDR) in the revised manuscript wherever it was necessary to adjust the alpha (lines 310 - 316; Table 2). The pattern of the results remained unchanged.

      In relation to my prior point, the authors could provide more clarity on how the conclusions drawn from the results relate to their predictions. For example, it is unclear what specific conclusions the authors draw based on their findings that ADHD show performance differences in all three BM perception tasks, but only local BM is related to social function within this group. Here, the claim is made that their results support a specific hypothesis, but it is unclear to me what hypothesis they are actually referring to (see line 343 & following). This lack of clarity is aggravated by the fact that throughout the rest of the discussion, in particular when discussing other findings to support their own conclusions, the authors often make no distinction between the two processes of interest. Lastly, some of the authors' conclusions related to their findings on local vs global BM processing are not logically following from the evidence: For instance, the authors conclude that their data supports the idea that social atypicalities are likely to reduce with age in ADHD individuals. However, according to their own account, local BM perception - the only measure that was related to social function in their study - is understood to be age invariant (and was indeed not predicted by age in the present study).

      Thank you for pointing out this issue. We have carefully revised the Discussion section about our findings to clarify these points:

      “Our study contributes several promising findings concerning atypical biological motion perception in ADHD. Specifically, we observe the atypical local and global BM perception in children with ADHD. Notably, a potential dissociation between the processing of local and global BM information is identified. The ability to process local BM cues appears to be linked to the traits of social interaction among children with ADHD. In contrast, global BM processing exhibits an age-related development. Additionally, general BM perception may be affected by factors including attention.” (lines 387 - 393)

      We have provided a detailed discussion on the two processes of interest to clarify their potential differences and the possible reasons behind the difference of the divergent developmental trajectories between local and global BM processing:

      “BM perception is considered a multi-level phenomenon56-58. At least in part, processing information of local BM and global BM appears to involve different genetic and neural mechanisms16,19. Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19. The sensitivity to local rather than global BM cues seems to emerge early in life. Visually inexperienced chicks exhibit a spontaneous preference for the BM stimuli of hen, even when the configuration was scrambled20. The same finding was reported in newborns. On the contrary, the ability to process global BM cues rather than local BM cues may be influenced by attention28,29 and shaped by experience24,56.” (lines 419 - 430)

      “We found that the ability to process global and general BM cues improved significantly with age in both TD and ADHD groups, which imply the processing module for global BM cues tends to be mature with development. In the ADHD group, the improvement in processing general and global BM cues is greater than that in processing local BM cues, while no difference was found in TD group. This may be due to the relatively higher baseline abilities of BM perception in TD children, resulting in a relatively milder improvement. These findings also suggest a dissociation between the development of local and global BM processing. There seems to be an acquisition of ability to process global BM cues, akin to the potential age-related improvements observed in certain aspects of social cognition deficits among individuals with ADHD5, whereas local BM may be considered an intrinsic trait19.” (lines 438 -449)

      In addition, we have rephased some inaccurate statements in revised manuscript. Another part of social dysfunction might be stable and due to the atypical local BM perception in ADHD individuals, although some studies found a part of social dysfunction would reduce with age in ADHD individuals. One reason is that some factors related to social dysfunction would improve with age, like the symptom of hyperactivity.

      Results reported are incomplete, making it hard for the reader to comprehensively interpret the findings and assess whether the conclusions drawn are valid. Whenever the authors report negative results (p-values > 0.05), the relevant statistics are not reported, and the data not plotted. In addition, summary statistics (group means) are missing for the main analysis.

      Thanks for your comments. We have provided the complete statistical results in the revised manuscript (lines 309 - 316) and supplementary material, which encompass relevant statistics and plots of negative results (Figure 4, Figure S2 and S3), in accordance with our research questions. And we have also included summary statistics in the Results section (lines 287 - 293).

      Some of the conclusions/statements in the article are too strong and should be rephrased to indicate hypotheses and speculations rather than facts. For example, in lines 97-99 the authors state that the finding of poor BM performance in TD children in a prior study 'indicated inferior applicability' or 'inapplicable experimental design'. While this is one possibility, a perhaps more plausible interpretation could be that TD children show 'poor' performance due to outstanding maturation of the underlying (global) BM processes (as the authors suggest themselves that BM perception can improve with age). There are several other examples where statements are too strong or misleading, which need attention.

      We thank you for pointing out the issue. We have toned down and rephrased the strong statements and made the necessary revisions.

      “Another study found that children with ADHD performed worse in BM detection with moderate ratios of noise34. This may be due to the fact that BM stimuli with noise dots will increase the difficulty of identification, which highlights the difference in processing BM between the two groups33,35.” (lines 111 - 115)

      Reviewer #3 (Public Review):

      Summary:

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The important and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, I am unsure that these differences between local and global skills are truly supported by the data and suggest some further analyses.

      Strengths:

      The authors present clear differences between the ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate your positive feedback. In revised manuscript, we have added more analyses to support the differences between local and global motion processing. Please refer to our response to the point #3 you mentioned below.

      Weaknesses:

      I am unsure that the data are strong enough to support claims about differences between global and local processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but do not seem so plausible to me. I am also concerned about gender, and possible autism, confounds when examining the effect of ADHD. Specifics:

      Gender confound. There are proportionally more boys in the ADHD than TD group. The authors appear to attempt to overcome this issue by including gender as a covariate. I am unsure if this addresses the problem. The vast majority of participants in the ADHD group are male, and gender is categorically, not continuously, defined. I'm pretty sure this violates the assumptions of ANCOVA.

      We appreciate your comments. We concur with you that although we observed a clear difference between local and global BM processing in ADHD, the evidence is to some extent preliminary. The mechanistic possibilities for why these abilities may dissociate have been discussed in revised manuscript. Please refer to the response to reviewer 2’s point #2. To further examine if gender played a role in the observed results, we used a statistical matching technique to obtain a sub-dataset. The pattern of results remained with the more balanced dataset (see Supplementary Information part 1). According to your suggestion, we have also presented the results without using gender as a covariate in main text and also separated the data of boys and girls on the plots (see Figure 1 and Figure S1). There were indeed no signs of a gender effect.

      Autism. Autism and ADHD are highly comorbid. The authors state that the TD children did not have an autism or ADHD diagnosis, but they do not state that the ADHD children did not have an autism diagnosis. Given the nature of the claims, this seems crucial information for the reader.

      Thanks for your suggestion. We have confirmed that all children with ADHD in our study were not diagnosed with autism. We used a semi-structured interview instrument (K-SADSPL-C) to confirm every recruited child with ADHD but not with ASD. The exclusion criteria for both groups were mentioned in the Materials and methods section:

      “Exclusion criteria for both groups were: (a) neurological diseases; (b) other neurodevelopmental disorders (e.g., ASD, Mental retardation, and tic disorders), affective disorders and schizophrenia…” (lines 158 - 162)

      Conclusions. The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. I think that if the authors wish to make strong claims here they must show inferential stats supporting (1) a difference between ADHD and TD SRS-Task 1 correlations, and (2) a difference in those differences for Task 2 and 3 relative to Task 1. I think they should also show a scatterplot of this correlation, with separate lines of best fit for the two groups, for Tasks 2 and 3 as well. I.e. Figure 4 should have 3 panels. I would recommend the same type of approach for age. Currently, they have small samples for correlations, and are reading much of theoretical significance between some correlations passing significance threshold and others not. It would be incredibly interesting if the social skills (as measured by SRS) only relate to local BM abilities, and age only to global, but I think the data are not so clear with the current information. I would be surprised if all BM abilities did not improve with age. Even if there is some genetic starter kit (and that this differs according to particular BM component), most abilities improve with learning/experience/age.

      Thank you for this recommendation. We have added more statistics to test differences between the correlations (a difference between ADHD and TD in SRS-Task 1 correlations (see the first paragraph of Supplementary Information part 2), a difference in SRS-response accuracy correlations for Task 2 and 3 relative to Task 1(see the second paragraph of Supplementary Information part 2), and a difference in age-response accuracy correlations for Task 2 and 3 relative to Task 1 in ADHD group (see Supplementary Information part 3)). Additionally, we have included scatterplots for SRS-Task1, SRS-Task2, SRS-Task3 (with separate lines of best fit for the two groups in each, see Figure 4), SRS-ADHD, SRS-TD, age-ADHD and age-TD (with separate lines of best fit for the three tasks in each, see Figure S2 and S3) to make a clear demonstration. Detailed results have been presented in the revised manuscript and Supplementary Information. We expect these further analyses would strengthen our conclusions.

      Theoretical assumptions. The authors make some sweeping statements about local vs global biological motion processing that need to be toned down. They assume that local processing is specifically genetically whereas global processing is a product of experience. The fact their global, but not local, task performance improves with age would tend to suggest there could be some difference here, but the existing literature does not allow for this certainty. The chick studies showing a neonatal preference are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      Thank you for pointing out this issue. We have toned down rephrased our claims that the difference between local and global BM processing according to your suggestion:

      “These findings suggest that local and global mechanisms might play different roles in BM perception, though the exact mechanisms underlying the distinction remain unclear. Exploring the two components of BM perception will enhance our understanding of the difference between local and global BM processing, shedding light on the psychological processes involved in atypical BM perception.” (lines 87 - 92)

      Reviewer #1 (Recommendations For The Authors):

      I have only a number of minor points that should be addressed prior to publication:

      L. 95ff: What is meant by 'inapplicability of experimental designs' ? This paragraph is somewhat unclear.

      In revised manuscript, we have clarified this point (lines 111 - 115).

      L. 146: The groups were not perfectly balanced for sex. Would results change fundamentally in a more balanced design, or can arguments be given that gender does not play a role, like it seems to be the case for some functions in biological motion perception (e.g. Pavlova et al. 2015; Tsang et al 2018). One could provide a justification that this disbalance does not matter or test for subsampled balanced data sets maybe.

      This point is similar to the point #1 from reviewer 3, and we have addressed this issue in our response above.

      L. 216 f.: In this paragraph it does not become very clear that the mask for the global task consisted of scrambles generated from walkers walking in the same direction. The mask for the local task then should consist of a balanced mask that contains the same amount of local motion cues indicating right and leftwards motion. Was this the case? (Not so clear from this paragraph.)

      Regarding the local task, the introduction of mask would make the task too difficult for children. Therefore, in the local task, we only displayed a scrambled walker without a mask, which was more suitable for children to complete the task. We have made clear this point in the corresponding paragraph (lines 232 - 241).

      L. 224 ff.: Here it would be helpful to see the 5 different 'facing' directions of the walkers. What does this exactly mean? Do they move on oblique paths that are not exactly orthogonal to the viewing directions, and how much did these facing directions differ?

      Out of the five walkers we used, two faced straight left or right, orthogonal to the viewing directions. Two walked with their bodies oriented 45 degrees from the observer, to the left or right. The last one walked towards the observer. We have included a video (Video 4) to demonstrate the 5 facing directions.

      L. 232: How was the number of 5 practicing trials determined/justified?

      As mentioned in main text, global BM processing is susceptible to learning. Therefore, too many practicing trials would increase BM visual experience and influence the results. We determined the number of training trials to be 5 based on the results of the pilot experiment. During this phase, we observed that nearly all children were able to understand the task requirements well after completing 5 practicing trials.

      L 239: Apparently no non-parametric statistics was applied. Maybe it would be good to mention in the Statistics section briefly why this was justified.

      We appreciate your suggestion and have cited two references in the Statistics section (Fagerland et al. 2012, Rochon et al. 2012). Fagerland et al., mentioned that when the sample size increases, the t-test is more robust. According to the central limit theorem, when the sample size is greater than 30, the sampling distribution of the mean can be safely assumed to be normal.

      (http://www2.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%201/I.07%20normal.p df). In fact, we also ran non-parametric statistics for our data and found the results to be robust.

      L 290: 'FIQ' this abbreviation should be defined.

      Regarding the abbreviation ’FIQ’, it stands for the abbreviation of the full-scale intellectual quotient, which was mentioned in Materials and methods section:

      “Scores of the four broad areas constitute the full-scale intellectual quotient (FIQ).”

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors gender etc. time appropriate beta_i values. This formula should be corrected or one just says that a GLM was run with the predictors gender ....

      The same criticism applies to these other models that follow.

      We thank you for pointing this out. We have modified all formulas accordingly in the revised manuscript (see part3 of the Results section).

      All these models assume linearity of the combination of the predictors.was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the group of patients. Does the same observation also apply to the normals?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data of the study will be available at https://osf.io/37p5s/.

      (2) Although overall, the language was clear and understandable, there are a few parts where language might confuse a reader and lead to misconceptions. For instance, line 52: Did the authors mean to refer to 'emotions and intentions' instead of 'emotions and purposes'? See also examples where rephrasing may help to reflect a statement is speculation rather than fact.

      Thanks for the comments. We have carefully checked the full text and rephrased the confused statements.

      (3) Line 83/84: Autism is not a 'mental disorder' - please change to something like 'developmental disability'. Authors are encouraged to adapt their language according to terms preferred by the community (e.g., see Fig. 5 in this article:

      https://onlinelibrary.wiley.com/doi/10.1002/aur.2864)

      Suggestion well taken. We have changed the wording accordingly:

      “In recent years, BM perception has received significant attention in studies of mental disorders (e.g., schizophrenia30) and developmental disabilities, particularly in ASD, characterized by deficits in social communication and social interaction31,32.” (lines 93 - 95)

      (4) Please report how the sample size for the study was determined.

      In the Materials and methods section (lines 168 - 173), we explained how the sample size was determined.

      Line 94: It would be helpful to have a brief description of what neurophysiological differences have been observed upon BM perception in children with ADHD.

      Thanks for the comment. We have added a brief description of neurophysiological findings in children with ADHD (lines 108 - 111).

      (6) Line 106/107 and 108/109: please add references.

      We have revised this part, and the relevant findings and references are in line with the revised manuscript (lines 77, 132 - 133).

      (7) Line 292: Please add what order the factors were entered into each regression model.

      Regarding this issue, we used SPSS 26 for the main analysis. SPSS utilizes the Type III sum of squares (default) to evaluate models. Regardless of the order in the GLM, we will obtain the same result. For more information, please refer to the documentation of SPSS 26 (https://www.ibm.com/docs/en/spss-statistics/26.0.0?topic=features-glm-univariate-analysis).

      Reviewer #3 (Recommendations For The Authors)

      (1) Task specifics. It is key to understanding the findings, as well as the dissociation between tasks, that the precise nature of the stimuli is clear. I think there is room for improvement in description here. Task 1 is described as involving relocating dots within the range of the intact walker. Of course, PLWs are created by presenting dots at the joints, so relocation can involve either moving to another place on the body, or random movement within the 2D spatial array (which likely involves moving it off the body). Which was done? It is said that Ps must indicate the motion direction, but what was the display of the walker? Sagittal? Task 2 requires detecting whether there is an intact walker amongst scrambled walkers. Were all walkers completely overlaid? Task 3 requires detecting the left v right facing of an intact walker at different orientations, presented amongst noise. So Task 3 requires determining facing direction and Task 1 walking direction. Are these tasks the same but described differently? Or can walkers ever walk backwards? Wrt this point, I also think it would help the reader if example videos were uploaded.

      We appreciate you for bringing this to our attention. With regards to Task 1, it appears that your second speculation is correct. We scrambled the original dots and randomly presented them within the 2D spatial array (which likely involved moving them off the body). As a result, the global configuration of the 13 dots was completed disrupted while preserving the motion trajectory of each individual dot. This led to the display of scrambled dots on the monitor (which does not resemble a human). In practice, these local BM cues contain information about motion direction. In Task 2, the target walkers completely overlaid by a mask that is approximately 1.44 times the size of the intact walker. The task requirements of Task 1 and Task3 are same, which is judging the motion (walking) direction. The difference is that Task 1 displayed a scrambled walker while Task 3 displayed an intact walker within a mask. We have clarified these points and improved our descriptions in Procedure section and created example videos for each task, which we believe will be helpful for the readers to understand each task.

      (2) Gender confound (see above). I think that the authors should present the results without gender as a covariate. Can they separate boys and girls on the plots with different coloured individual datapoints, such that readers can see whether it's actually a gender effect driving the supposed ADHD effect? And show that there are no signs of a gender effect in their TD group?

      This point is similar to the point #1 you mentioned. Please refer to our response to that point above.

      (3) Autism possible confound (see above). I think the authors must report whether any of the ADHD group had an autism diagnosis.

      Please refer to the response for the point #2 your mentioned.

      (4) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors should add stats demonstrating differences between the correlations to support such claims, as well as demonstrating appropriate scatterplots for SRS-Task 1, SRS-Task 2, SRS-Task 3 and age-Task 1, age-Task2 and age-Task 3 (with separate lines of best fit for the two groups in each).

      Please refer to the response for the point #3 your mentioned.

      (5) Theoretical assumptions (see above). I would suggest rephrasing all claims here to outline that these discussed mechanistic differences between local and global BM processing are only possibilities and not known on the basis of existing data.

      Please refer to the response for the point #4 your mentioned.

    1. Author Response

      The following is the authors’ response to the original reviews.

      General response:

      We thank the reviewers for their thorough evaluation of our manuscript. Working on the raised concerns has improved the manuscript greatly. Specifically, the recommendations to clarify the adopted assumptions in the study strengthened the motivation for the study; further, following up some of the reviewers’ concerns with additional analyses validated our chosen measures and strengthened the compatibility of the findings with the predictions of the dynamic attending framework. Below, you will find our detailed point-by-point responses, along with information on specific revisions.

      The reviewers pointed out that study assumptions were unclear, some of the measures we chose were not well motivated, and the findings were not well enough explained considering possible alternatives. As suggested, we reformulated the introduction, explained the common assumptions of entrainment models that we adopted in the study, and further clarified how our chosen measures for the properties of the internal oscillators relate to these assumptions.

      We realized that the initial emphasis on the compatibility of the current findings with predictions of entrainment models might have led to the wrong impression that the current study aimed to test whether auditory rhythmic processing is governed by timekeeper or oscillatory mechanisms. However, testing these theoretical models to explain human behavior necessitates specific paradigms designed to compare the contrasting predictions of the models. A number of studies do so by manipulating regularity in a stimulus sequence or expectancy of stimulus onsets, or assessing the perceived timing of targets that follow a stimulus rhythm. Such paradigms allow testing the prediction that an oscillator, underlying perceptual timing, would entrain to a regular but not an irregular sequence. This would further lead to stronger expectancies at the peak of the oscillation, where 'attentional energy' is the highest. These studies report 'rhythmic facilitation', where targets that align with the peaks of the oscillation are better detected than those that do not (see Henry and Herrmann (2014) and Haegens and Zion Golumbic (2018) for reviews). Additionally, unexpected endings of standard intervals, preceded by a regular entraining sequence, lead to a biased estimation of subsequent comparison intervals, due to the contrast between the attentional oscillator's phase and a deviating stimulus onset (Barnes & Jones, 2000; Large & Jones, 1999; McAuley & Jones, 2003). Even a sequence rate that is the multiple of the to-be-judged standard and comparison intervals give rise to rhythmic facilitation (McAuley & Jones, 2003), and the expectancy of a stimulus onset modulates duration judgments. These findings are not compatible with predictions of timekeeper models as time intervals in these models are represented arbitrarily and are not affected by expectancy violations.

      In the current study, we adopted an entrainment approach to timing, rather than testing predictions of competing models. This choice was motivated by several aspects of entrainment models that align better with the aims of the current study. First, our focus was on understanding perception and production of rhythms, for which perception is better explained by entrainment models than by timekeeper models, which excel at explaining perception of isolated time intervals (McAuley, 2010). Moreover, we wanted to leverage the fact that entrainment models elegantly include parameters that can explain different aspects of timing abilities, and these parameters can be estimated in an individualized manner. For instance, the flexibility property of oscillators can be linked to the ability to adapt to changes in external context, while timekeeper or Bayesian timing approaches lack a specific mechanism to quantify temporal adaptation across perceptual and motor domains. Finally, that entrainment is observed across theoretical, behavioral, and neural levels renders entrainment models useful in explaining and generalizing behavior across different domains. Nevertheless, some results showed partial compatibility with predictions of the timekeeper models, such as the modulation of 'bestperformance rates' by the temporal context, observed in Experiment 1’ random-order sessions, where stimulus rates maximally differed across consecutive trials. However, given that the mean, standard deviation, and range of stimulus rates were identical across sessions, and timekeeper models assume no temporal adaptation in duration perception, we should have observed similar results across these sessions. Conversely, we found significant accuracy differences, biased duration judgments, and harmonic relationships between the best-performance rates. We elaborate more on these results with respect to their compatibility with the contrasting models of human temporal perception in the revised discussion.

      Responses to specific comments:

      (1.1) At times, I found it challenging to evaluate the scientific merit of this study from what was provided in the introduction and methods. It is not clear what the experiment assumes, what it evaluates, and which competing accounts or predictions are at play. While some of these questions are answered, clear ordering and argumentative flow is lacking. With that said, I found the Abstract and General Discussion much clearer, and I would recommend reformulating the early part of the manuscript based on the structure of those segments.

      Second, in my reading, it is not clear to what extent the study assumes versus demonstrates the entrainment of internal oscillators. I find the writing somewhat ambiguous on this count: on the one hand, an entrainment approach is assumed a priori to design the experiment ("an entrainment approach is adopted") yet a primary result of the study is that entrainment is how we perceive and produce rhythms ("Overall, the findings support the hypothesis that an oscillatory system with a stable preferred rate underlies perception and production of rhythm..."). While one could design an experiment assuming X and find evidence for X, this requires testing competing accounts with competing hypotheses -- and this was not done.

      We appreciate the reviewer’s concerns and suggestion to clarify the assumptions of the study and how the current findings relate to the predictions of competing accounts. To address these concerns:

      • We added the assumptions of the entrainment models that we adopted in the Introduction section and reformulated the motivation to choose them accordingly.

      • We clarified in the Introduction that the study’s aim was not to test the entrainment models against alternative theories of rhythm perception.

      • We added a paragraph in the General Discussion to further distinguish predictions from the competing accounts. Here we discussed the compatibility of the findings with predictions of both entrainment and timekeeper models.

      • We rephrased reasoning in the Abstract, Introduction, and General Discussion to further clarify the aims of the study, and how the findings support the hypotheses of the current study versus those of the dynamic attending theory.

      (1.2) In my view, more evidence is required to bolster the findings as entrainment-based regardless of whether that is an assumption or a result. Indeed, while the effect of previous trials into the behaviour of the current trial is compatible with entrainment hypotheses, it may well be compatible with competing accounts as well. And that would call into question the interpretation of results as uncovering the properties of oscillating systems and age-related differences in such systems. Thus, I believe more evidence is needed to bolster the entrainment hypothesis.

      For example, a key prediction of the entrainment model -- which assumes internal oscillators as the mechanism of action -- is that behaviour in the SMT and PTT tasks follows the principles of Arnold's Tongue. Specifically, tapping and listening performance should worsen systematically as a function of the distance between the presented and preferred rate. On a participant-by-participant, does performance scale monotonically with the distance between the presented and preferred rate? Some of the analyses hint at this question, such as the effect of 𝚫IOI on accuracy, but a recontextualization, further analyses, or additional visualizations would be helpful to demonstrate evidence of a tongue-like pattern in the behavioural data. Presumably, non-oscillating models do not follow a tongue-like pattern, but again, it would be very instructive to explicitly discuss that.

      We thank the reviewer for the excellent suggestion of assessing 'Arnold's tongue' principles in timing performance. We agree that testing whether timing performance forms a pattern compatible with an Arnold tongue would further support our assumption that the findings related to preferred rate stem from an entrainment-based mechanism. We rather refer to the ‘entrainment region’, (McAuley et al., 2006) that corresponds to a slice in the Arnold tongue at a fixed stimulus intensity that entrains the internal oscillator. In both representations of oscillator behavior across a range of stimulus rates, performance should systematically increase as the difference between the stimulus rate and the oscillator's preferred rate, namely, 'detuning' decreases. In response to the reviewer’s comment, we ran further analyses to test this key prediction of entrainment models. We assessed performance at stimulus rates that were faster and slower than an individual's preferred rate estimates from in Experiment 1. To do so, we ran logistic regression models on aggregated datasets from all participants and sessions, where normalized IOI, in trials where the stimulus rate was faster than the preferred rate estimate, and in those where it was slower, predicted accuracy. Stimulus IOIs were normalized within each direction (faster- versus slower-than-preferred rate) using z-score transformation, and the direction was coded as categorical in the model. We reasoned that a positive slope for conditions with stimulus rates faster than IOI, and a negative slope from conditions with slower rates, should indicate a systematic accuracy increase toward the preferred rate estimate. This is exactly what we found. These results revealed significant main effect for the IOI and a significant interaction between IOI and direction, indicating that accuracy increased towards the preferred rate at fast rates and decreased as the stimulus rate diverged from the preferred rate at slow rates. We added these results to the respective subsections of Experiment 1 Methods and Results, added a plot showing the slices of the regression surfaces to Figure 2B and elaborated on the results in Experiment 1 Discussion. As the number of trials in Experiment 2 was much lower (N = 81), we only ran these additional analyses in Experiment 1.

      (1.3) Fourth, harmonic structure in behaviour across tasks is a creative and useful metric for bolstering the entrainment hypothesis specifically because internal oscillators should display a preference across their own harmonics. However, I have some doubts that the analyses as currently implemented indicate such a relationship. Specifically, the main analysis to this end involves summing the residuals of the data closest to y=x, y=2*x and y=x/2 lines and evaluating whether this sum is significantly lower than for shuffled data. Out of these three dimensions, y=x does not comprise a harmonic, and this is an issue because it could by itself drive the difference of summed residuals with the shuffled data. I am uncertain whether rerunning the same analysis with the x=y dimension excluded constitutes a simple resolution because presumably there are baseline differences in the empirical and shuffled data that do not have to do with harmonics that would leak into the analysis. To address this, a simulation with ground truths could be helpful to justify analyses, or a different analysis that evaluates harmonic structure could be thought of.

      We thank the reviewer for pointing out the weakness of the permutation test we developed to assess the harmonic relationship between Experiment 1’s preferred rate estimates. Datapoints that fall on the y=x line indeed do not represent harmonic relationships. They rather indicate one-to-one correspondence between the axes, which is a stronger indicator of compatibility between the estimates. Maybe speaking to the reviewer’s point, standard correlation analyses were not significant, which would have been expected if the permutation results were being driven by the y=x relationship. This was the reason we developed the permutation test to include integer-ratio datapoints could also contribute.

      Based on reviewer’s comment, we ran additional analyses to assess the harmonic relationships between the estimates. The first analysis involved a circular approach. We first normalized each participant’s estimates by rescaling the slower estimate with respect to the faster one by division; and converted the values to radians, since a pair of values with an integer-ratio relationship should correspond to the same phase on a unit circle. Then, we assessed whether the resulting distribution of normalized values differed from a uniform distribution, using Rayleigh’s test, which was significant (p = .004). The circular mean of the distribution was 44 (SD = 53) degrees (M = 0.764, SD = 0.932 radians), indicating that the slower estimates were slightly slower than the fast estimate or its duplicates. As this distribution was skewed toward positive values due to the normalization procedure, we did not compare it against zero angle. Instead, we ran a second test, which was a modular approach. We first calculated how much the slower estimate deviated proportionally from the faster estimate or its multiples (i.e., subharmonics) by normalizing the estimates from both sessions by the faster estimate. The outcome measure was the modulus of the slower, relative to the faster estimate, divided by the faster estimate. Then, we ran a permutation test, shuffling the linear-order session estimates over 1000 iterations and taking the median percent deviation values for each iteration. The test statistic was significant (p = .004), indicating that the harmonic relationships we observed in the estimates were not due to chance or dependent on the assessment method. We added these details of additional analyses to assess harmonic relationships between the Experiment 1 preferred rate estimates in the Supplementary Information.

      (2.1) The current study is presented in the framework of the ongoing debate of oscillator vs. timekeeper mechanisms underlying perceptual and motor timing, and authors claim that the observed results support the former mechanism. In this line, every obtained result is related by the authors to a specific ambiguous (i.e., not clearly related to a biophysical parameter) feature of an internal oscillator. As pointed out by an essay on the topic (Doelling & Assaneo, 2021), claiming that a pattern of results is compatible with an "oscillator" could be misleading, since some features typically used to validate or refute such mechanisms are not well grounded on real biophysical models. Relatedly, a recent study (Doelling et al., 2022) shows that two quantitatively different computational algorithms (i.e., absolute vs relative timing) can be explained by the same biophysical model. This demonstrates that what could be interpreted as a timekeeper, or an oscillator can represent the same biophysical model working under different conditions. For this reason, if authors would like to argue for a given mechanism underlying their observations, they should include a specific biophysical model, and test its predictions against the observed behavior. For example, it's not clear why authors interpret the observation of the trial's response being modulated by the rate of the previous one, as an oscillator-like mechanism underlying behavior. As shown in (Doelling & Assaneo, 2021) a simple oscillator returns to its natural frequency as soon as the stimulus disappears, which will not predict the long-lasting effect of the previous trial. Furthermore, a timekeeper-like mechanism with a long enough integration window is compatible with this observation.

      Still, authors can choose to disregard this suggestion, and not testing a specific model, but if so, they should restrict this paper to a descriptive study of the timing phenomena.

      We thank the reviewer for their valuable suggestion of to include a biophysical model to further demonstrate the compatibility of the current findings with certain predictions of the model. While we acknowledge the potential benefits of implementing a biophysical model to understand the relationships between model parameters and observed behavior, this goes beyond the scope of the current study.

      We note that we have employed a modeling approach in a subsequent study to further explore how the properties and the resulting behavior of an oscillator map onto the patterns of human behavior we observed in the current study (Kaya & Henry, 2024, February 5). In that study, we fitted a canonical oscillator model, and several variants thereof, separately to datasets obtained from random-order and linear-order sessions of Experiment 1 of the current submission. The base model, adapted from McAuley and Jones (2003), assumed sustained oscillations within the trials of the experiment, and complete decay towards the preferred rate between the trials. We introduced a gradual decay parameter (Author response image 1A) that weighted between the oscillator's concurrent period value at the time of decay and its initial period (i.e., preferred rate). This parameter was implemented only within trials, between the standard stimulus sequence and comparison interval in Variant 1, between consecutive trials in Variant 2, and at both temporal locations in Variant 3. Model comparisons (Author response image 1B) showed that Variant 3 was the best-fitting model for both random- and linear-order datasets. Crucially, estimates for within- and between-trial decay parameters, obtained from Variant 3, were positively correlated, suggesting that oscillators gradually decayed towards their preferred rate at similar timescales after cessation of a stimulus.

      Author response image 1.

      (A) Illustration of the model fitted to Experiment 1 datasets and (B) model comparison results. In each trial, the model is initialized with a phase (ɸ) and period (P) value. A At the offset of each stimulus interval i, the model updates its phase (pink arrows) and period (blue arrows) depending on the temporal contrast (C) between the model state and stimulus onset and phase and period correction weights, Wɸ and Wp. Wdecaywithin updates the model period as a weighted average between the period calculated for the 5th interval, P5, and model’s preferred rate, P0. C, calculated at the offset of the comparison interval. Wdecaybetween parameter initializes the model period at the beginning of a new trial as a weighted average between the last period from the previous trial and P0. The base model’s assumptions are marked by asterisks, namely sustained oscillation during the silence (i=5), and complete decay between trials. B Left: The normalized probability of each model having the minimum BIC value across all models and across participants. Right: AICc, calculated from each model’s fit to participants’ single-session datasets. In both panels, random-order and linear-order sessions were marked in green and blue, respectively. B denotes the base model, and V1, V2 and V3 denote variants 1, 2 and 3, respectively.

      Although our behavioral results and modeling thereof must necessarily be interpreted as reflecting the mechanics of an attentional, but not a neural oscillator, these findings might shed light on the controversy in neuroscience research regarding the timeline of entrainment decay. While multiple studies show that neural oscillations can continue at the entrained rate for a number of cycles following entrainment (Bouwer et al., 2023; Helfrich et al., 2017; Lakatos et al., 2013; van Bree et al., 2021), different modeling approaches reveal mixed results on this phenomenon. Whereas Doelling and Assaneo (2021) show that a Stuart-Landau oscillator returns immediately back to its preferred rate after synchronizing to an external stimulus, simulations of other oscillator types suggest gradual decay toward the preferred rate (Large, 1994; McAuley, 1995; Obleser et al., 2017) or self-sustained oscillation at the external stimulus rate (Nachstedt et al., 2017).

      While the Doelling & Assaneo study (2021) provides insights on entrainment and behavior of the Stuart-Landau oscillator under certain conditions, the internal oscillators hypothesized by the dynamic attending theory might have different forms, therefore may not adhere to the behavior of a specific implementation of an oscillator model. Moreover, that a phase-coupled oscillator does not show gradual decay does not preclude that models with period tracking behave similarly. Adaptive frequency oscillators, for instance, are able to sustain the oscillation after the stimulus ceases (Nachstedt et al., 2017). Alongside with models that use Hebbian learning (Roman et al., 2023), the main implementations of the dynamic attending theory have parameters for period tracking and decay towards the preferred rate (Large, 1994; McAuley, 1995). In fact, the u-shaped pattern of duration discrimination sensitivity across a range of stimulus rates (Drake & Botte, 1993) is better explained by a decaying than a non-decaying oscillator (McAuley, 1995). To conclude, the literature suggests that the emergence of decay versus sustain behavior of the oscillators and the timeline of decay depend on the particular model used as well as its parameters and does therefore not offer a one-for-all solution.

      Reviewer #2 (Recommendations For The Authors):

      • Are the range, SD and mean of the random-order and linear-order sessions different? If so, why?

      Information regarding the SD and mean of the random-order and linear-order sessions was added to Experiment 1 Methods section.

      “While the mean (M = 599 ms), standard deviation (SD = 231 ms) and range (200, 998 ms) of the presented stimulus IOIs were identical between the sessions, the way IOI changed from trial to trial was different.“ (p. 5)

      • Perhaps the title could mention the age-related flexibility effect you demonstrate, which is an important contribution that without inclusion in the title could be missed in literature searches.

      We have changed the title to include age-related changes in oscillator flexibility. Thanks for the great suggestion.

      • Is the statistical analysis in Figure 4A between subjects? Shouldn't the analyses be within subjects?

      We have now better specified that the statistical analyses of Experiment 2’s preferred rate estimates were across the tasks, in Figure 4 caption.

      "Vertical lines above the box plots represent within-participants pairwise comparisons." (p. 17)

      • It says participants' hearing thresholds were measured using standard puretone audiometry. What threshold warranted participant exclusion and how many participants were excluded on the basis of hearing skills?

      We have now clarified that hearing threshold was not an exclusion criterion.

      "Participants were not excluded based on hearing threshold." (p. 11)

      • "Tapping rates from 'fastest' and 'slowest' FMT trials showed no difference between pre- and postsession measurements, and were additionally correlated across repeated measurements" - could you point to the statistics for this comparison?

      Table 2 includes the results from both experiments’ analyses on unpaced tapping. (p. 10)

      “The results of the pairwise comparisons between tapping rates from all unpaced tapping tasks across measurements are provided in Table 2.” (p. 15)

      • How was the loudness (dB) of the woodblock stimuli determined on a participant-by-participant basis? Please ignore if I missed this.

      Participants were allowed to set the volume to a comfortable level.

      "Participants then set the sound volume to a level that they found comfortable for completing the task." (p. 4)

      • Please spell out IOI, DEV, and other terms in full the first time they are mentioned in the manuscript.

      We added the descriptions of abbreviations before their initial mention.

      "In each experimental session, 400 unique trials of this task were presented, each consisting of a combination of the three main independent variables: the inter-onset interval, IOI; amount of deviation of the comparison interval from the standard, DEV, and the amount of change in stimulus IOI between consecutive trials, 𝚫IOI. We explain each of these variables in detail in the next paragraphs." (p. 4)

      • Small point: In Fig 1 sub-text, random order and linear order are explained in reverse order from how they are presented in the figure.

      We fixed the incompatibility between of Figure 1 content and caption.

      • Small point: I found the elaborate technical explanation of windowing methods, including alternatives that were not used, unnecessary.

      We moved the details of the smoothing analysis to the Supplementary Information.

      • With regard to the smoothing explanation, what is an "element"? Is this a sample? If so, what was the sampling rate?

      We reworded ‘element’ as ‘sample’. In the smoothing analyses, the sampling rate was the size of the convolution window, which was set to 26 for random-order, 48 for linear-order sessions.

      • Spelling/language error: "The pared-down", "close each other", "always small (+4 ms), than".

      We fixed the spelling errors.

      Reviewer #3 (Recommendations For The Authors):

      • My main concern is the one detailed as a weakness in the public review. In that direction, if authors decide to keep the mechanistic interpretation of the outcomes (which I believe is a valuable one) here I suggest a couple of models that they can try to adapt to explain the pattern of results:

      a. Roman, Iran R., et al. "Hebbian learning with elasticity explains how the spontaneous motor tempo affects music performance synchronization." PLOS Computational Biology 19.6 (2023): e1011154.

      b. Bose, Amitabha, Áine Byrne, and John Rinzel. "A neuromechanistic model for rhythmic beat generation." PLoS Computational Biology 15.5 (2019): e1006450.

      c. Egger, Seth W., Nhat M. Le, and Mehrdad Jazayeri. "A neural circuit model for human sensorimotor timing." Nature Communications 11.1 (2020): 3933.

      d. Doelling, K. B., Arnal, L. H., & Assaneo, M. F. (2022). Adaptive oscillators provide a hard-coded Bayesian mechanism for rhythmic inference. bioRxiv, 2022-06

      Thanks for the suggestion! Please refer to our response (2.1.) above. To summarize, although we considered a full, well-fleshed-out modeling approach to be beyond the scope of the current work, we are excited about and actively working on exactly this. Our modeling take is available as a preprint (Kaya & Henry, 2024, February 5).

      • Since the authors were concerned with the preferred rate they circumscribed the analysis to extract the IOI with better performance. Would it be plausible to explore how is the functional form between accuracy and IOI? This could shed some light on the underlying mechanism.

      Unfortunately, we were unsure about what the reviewer meant by the functional form between accuracy and IOI. We interpret it to mean a function that takes IOI as input and outputs an accuracy value. In that case, while we agree that estimating this function might indeed shed light on the underlying mechanisms, this type of analysis is beyond the scope of the current study. Instead, we refer the reviewer and reader to our modeling study (please see our response (2.1.) above) that includes a model which takes the stimulus conditions, including IOI, and model parameters for preferred rate, phase and period correction and within- and between-trial decay and outputs predicted accuracy for each trial. We believe that such modeling approach, as compared to a simple function, gives more insights regarding the relationship between oscillator properties and duration perception.

      • Is the effect caused by the dIOI modulated by the distance to the preferred frequency?

      We thank the reviewer for the recommendation. We measured flexibility by the oscillator's ability to adapt to on-line changes in the temporal context (i.e., effect of 𝚫IOI on accuracy), rather than by quantifying the range of rates with improved accuracy. Nevertheless, we acknowledge that distance to the preferred rate should decrease accuracy, as this is a key prediction of entrainment models. In fact, testing this prediction was recommended also by the other reviewer, in response to which we ran additional analyses. These analyses involved assessment of the relationship between accuracy and detuning. Specifically, we assessed accuracy at stimulus rates that were faster and slower than an individual's preferred rate estimates from in Experiment 1. We ran logistic regression models on aggregated datasets from all participants and sessions, where accuracy was predicted by z-scored IOI, from trials where the stimulus rate was faster than the preferred rate estimate, and in those where it was slower. The model had a significant main effect of IOI and an interaction between IOI and direction (i.e., whether stimulus rate was faster or slower than the preferred rate estimate), indicating that accuracy increased towards the preferred rate at fast rates and decreased as the stimulus rate diverged from the preferred rate at slow rates. We added information regarding this analysis to the respective subsections of Experiment 1 Methods and Results, added a plot showing the slices of the regression surfaces to Figure 2B and elaborated on the results in Experiment 1 Discussion. As the number of trials in Experiment 2 was insufficient, we only ran these additional analyses in Experiment 1. We agree that a range-based measure of oscillator flexibility would also index the oscillators’ adaptive abilities. However, the current paradigms were designed for assessment of temporal adaptation. Thus, comparison of the two approaches to measuring oscillator flexibility, which can be addressed in future studies, is beyond the scope of the current study.

      • Did the authors explore if the "motor component" (the difference between the motor and perceptual rates) is modulated by the participants age?

      In response to the reviewer’s comment, we correlated the difference between the motor and perceptual rates with age, which was nonsignificant.

      • Please describe better the slider and the keypress tasks. For example, what are the instructions given to the participant on each task, and how they differ from each other?

      We added the Experiment 2 instructions in Appendix A.

      • Typos: The caption in figure one reads 2 ms, while I believe it should say 200. Page 4 mentions that there are 400 trials and page 5 says 407.

      We fixed the typos.

      References

      Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cogn Psychol, 41(3), 254-311. https://doi.org/10.1006/cogp.2000.0738

      Bouwer, F. L., Fahrenfort, J. J., Millard, S. K., Kloosterman, N. A., & Slagter, H. A. (2023). A Silent Disco: Differential Effects of Beat-based and Pattern-based Temporal Expectations on Persistent Entrainment of Low-frequency Neural Oscillations. J Cogn Neurosci, 35(6), 9901020. https://doi.org/10.1162/jocn_a_01985

      Doelling, K. B., Arnal, L. H., & Assaneo, M. F. (2022). Adaptive oscillators provide a hard-coded Bayesian mechanism for rhythmic inference. bioRxiv. https://doi.org/10.1101/2022.06.18.496664

      Doelling, K. B., & Assaneo, M. F. (2021). Neural oscillations are a start toward understanding brain activity rather than the end. PLoS Biol, 19(5), e3001234. https://doi.org/10.1371/journal.pbio.3001234

      Drake, C., & Botte, M. C. (1993). Tempo sensitivity in auditory sequences: evidence for a multiplelook model. Percept Psychophys, 54(3), 277-286. https://doi.org/10.3758/bf03205262

      Haegens, S., & Zion Golumbic, E. (2018). Rhythmic facilitation of sensory processing: A critical review. Neurosci Biobehav Rev, 86, 150-165. https://doi.org/10.1016/j.neubiorev.2017.12.002

      Helfrich, R. F., Huang, M., Wilson, G., & Knight, R. T. (2017). Prefrontal cortex modulates posterior alpha oscillations during top-down guided visual perception. Proc Natl Acad Sci U S A, 114(35), 9457-9462. https://doi.org/10.1073/pnas.1705965114

      Henry, M. J., & Herrmann, B. (2014). Low-Frequency Neural Oscillations Support Dynamic Attending in Temporal Context. Timing & Time Perception, 2(1), 62-86. https://doi.org/10.1163/22134468-00002011

      Kaya, E., & Henry, M. J. (2024, February 5). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator. https://doi.org/10.31234/osf.io/q9uvr

      Lakatos, P., Musacchia, G., O'Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). The spectrotemporal filter mechanism of auditory selective attention. Neuron, 77(4), 750-761. https://doi.org/10.1016/j.neuron.2012.11.034

      Large, E. W. (1994). Dynamic representation of musical structure. The Ohio State University.

      Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106(1), 119-159. https://doi.org/Doi 10.1037/0033295x.106.1.119

      McAuley, J. D. (1995). Perception of time as phase: Toward an adaptive-oscillator model of rhythmic pattern processing Indiana University Bloomington].

      McAuley, J. D. (2010). Tempo and Rhythm. In Music Perception (pp. 165-199). https://doi.org/10.1007/978-1-4419-6114-3_6

      McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: a comparison of interval and entrainment approaches to short-interval timing. J Exp Psychol Hum Percept Perform, 29(6), 1102-1125. https://doi.org/10.1037/0096-1523.29.6.1102

      McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: life span development of timing and event tracking. J Exp Psychol Gen, 135(3), 348-367. https://doi.org/10.1037/0096-3445.135.3.348

      Nachstedt, T., Tetzlaff, C., & Manoonpong, P. (2017). Fast Dynamical Coupling Enhances Frequency Adaptation of Oscillators for Robotic Locomotion Control. Front Neurorobot, 11, 14. https://doi.org/10.3389/fnbot.2017.00014

      Obleser, J., Henry, M. J., & Lakatos, P. (2017). What do we talk about when we talk about rhythm? PLoS Biol, 15(9), e2002794. https://doi.org/10.1371/journal.pbio.2002794

      Roman, I. R., Roman, A. S., Kim, J. C., & Large, E. W. (2023). Hebbian learning with elasticity explains how the spontaneous motor tempo affects music performance synchronization. PLoS Comput Biol, 19(6), e1011154. https://doi.org/10.1371/journal.pcbi.1011154<br /> van Bree, S., Sohoglu, E., Davis, M. H., & Zoefel, B. (2021). Sustained neural rhythms reveal endogenous oscillations supporting speech perception. PLoS Biol, 19(2), e3001142. https://doi.org/10.1371/journal.pbio.3001142

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor suggestions:

      Abstract: I really liked the conclusion (that IM and VWM are two temporal extremes of the same process) as articulated in lines 557--563. (It is always satisfying when the distinction between two things that seem fundamentally different vanishes). If something like this but shorter could be included in the Abstract, it would highlight the novel aspects of the results a little more, I think.

      Thank you for this comment. We have added the following to the abstract:

      “A key conclusion is that differences in capacity classically thought to distinguish IM and VWM are in fact contingent upon a single resource-limited WM store.”

      L 216: There's an orphan parenthesis in "(justifying the use".

      Fixed.

      L 273: "One surprising result was the observed set size effect in the 0 ms delay condition". In this paragraph, it might be a good idea to remind the reader of the difference between the simultaneous and zero-delay conditions. If I got it right, the results differ between these conditions because it takes some amount of processing time to interpret the cue and free the resources associated with the irrelevant stimuli. Recalling that fact would make this paragraph easier to digest.

      That is correct. However, at this point in the text, we have not yet fitted the DyNR model to the data. Therefore, we believe that introducing cue processing and resource reallocation as concepts that differentiate between those two conditions would disrupt the flow of this paragraph. We address these points soon after, in a paragraph starting on line 341.

      Figures 3, 5: The labels at the bottom of each column in A would be more clear if placed at the top of each column instead. That way, the x-axis for the plots in A could be labeled appropriately, as "Error in orientation estimate" or something to that effect.

      We edited both figures, now Figure 4 and Figure 6, as suggested.

      L 379: It should be "(see Eq 6)", I believe.

      That is correct, line 379 (currently line 391) should read ‘Eq 6’. Fixed.

      L 379--385: I was a bit mystified as to why the scaled diffusion rate produced a worse fit than a constant rate. I imagine the scaled version was set to something like

      sigma^2_diff_scaled = sigma^2_base + K*(N-1)

      where N is the set size and sigma^2_base and K are parameters. If this model produced a similar fit as with a constant diffusion rate, the AIC would penalize it because of the extra parameter. But why would the fit be worse (i.e., not match the pattern of variability)? Shouldn't the fitter just find that the K=0 solution is the best? Not a big deal; the Nelder-Mead solutions can wobble when that many parameters are involved, but if there's a simple explanation it might be worth commenting on.

      The scaled diffusion was implemented by extending Eq 6 in the following way:

      σ(t)2 = (t-toffset) * σ̇ 2diff * N

      where N is set size. Therefore, the scaling was not associated with a free parameter that could become 0 if set size did not affect diffusion rate, but variability rather mandatory increased with set size. We now clarify this in the text:

      “The second variant was identical to the proposed model, except that we replaced the constant diffusion rate with a set size scaled diffusion rate by multiplying the right side of Eq 6 by N.“

      Figure 4 is not mentioned in the main text. Maybe the end of L 398 would be a good place to point to it. The paragraph at L 443-455 would also benefit from a couple of references to it.

      Thank you for this suggestion. Figure 4 (now Figure 5) was previously mentioned on line 449 (previously line 437), but now we have included it on line 410 (previously line 398), within the paragraph spanning lines 455-467 (previously 443-455), and also on line 136 where we first discuss masking effects.

      L 500: Figure S7 is mentioned before Figures S5 and S6. Quite trivial, I know....

      Thank you for this comment. There was no specific reason for Figure S7 to appear after S5 & S6, so we simply swapped their order to be consistent with how they are referred to in the manuscript (i.e., S7 became S5, S5 became S6, and S6 became S7).

      Reviewer #2 (Recommendations For The Authors):

      (1) One potential weakness is that the model assumes sensory information is veridical. However, this isn't likely the case. Acknowledging noise in sensory representations could affect the model interpretation in a couple of different ways. First, neurophysiological recordings have shown normalization affects sensory representations, even when a stimulus is still present on the screen. The DyNR model partially addresses this concern because reports are drawn from working memory, which is normalized. However, if sensory representations were also normalized, then it may improve the model variant where subjects draw directly from sensory representations (an alternative model that is currently described but discarded).

      Thank you for this suggestion. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      Second, visual adaptation predicts sensory information should decrease over time. This would predict that for long stimulus presentation times, the error would increase. Indeed, this seems to be reflected in Figure 5B. This effect is not captured by the DyNR model.

      Indeed, neural responses in the visual cortex have been observed to quickly adapt during stimulus presentation, showing reduced responses to prolonged stimuli after an initial transient (Groen et al., 2022; Sawamura et al., 2006; Zhou et al., 2019). This adaptation typically manifests as 1) reduced activity towards the end of stimulus presentation and 2) a faster decay towards baseline activity after stimulus offset.

      In the DyNR model, we use an idealized solution in which we convolve the presented visual signal with a response function (i.e., temporal filter). At the longest presentation durations, in DyNR, the sensory signal plateaus and remains stable until stimulus offset. Because our psychophysical data does not allow us to identify the exact neural coding scheme that underlies the sensory signal, we tend to favour this simple implementation, which is broadly consistent with some previous attempts to model temporal dynamics in sensory responses (e.g., Carandini and Heeger, 1994). However, we agree with the reviewer that some adaptation of the sensory signal with prolonged presentation would also be consistent with our data.

      We have added the following to the manuscript:

      “In Experiment 2, the longest presentation duration shows an upward trend in error at set sizes 4 and 10. While this falls within the range of measurement error, it is also possible that this is a meaningful pattern arising from visual adaptation of the sensory signal, whereby neural populations reduce their activity after prolonged stimulation. This would mean less residual sensory signal would be available after the cue to supplement VWM activity, predicting a decline in fidelity at higher set sizes. Visual adaptation has previously been successfully accounted for by a type of delayed normalization model in which the sensory signal undergoes a series of linear and nonlinear transformations (Zhou et al., 2019). Such a model could in future be incorporated into DyNR and validated against psychophysical and neural data.”

      Carandini, M., & Heeger, D. J. (1994). Summation and division by neurons in primate visual cortex. Science, 264(5163), 1333–1336. https://doi.org/10.1126/science.8191289

      Groen, I. I. A., Piantoni, G., Montenegro, S., Flinker, A., Devore, S., Devinsky, O., Doyle, W., Dugan, P., Friedman, D., Ramsey, N. F., Petridou, N., & Winawer, J. (2022). Temporal Dynamics of Neural Responses in Human Visual Cortex. The Journal of Neuroscience, 42(40), 7562–7580. https://doi.org/10.1523/JNEUROSCI.1812-21.2022

      Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of Neuronal Adaptation Does Not Match Response Selectivity: A Single-Cell Study of the fMRI Adaptation Paradigm. Neuron, 49(2), 307–318. https://doi.org/10.1016/j.neuron.2005.11.028

      Zhou, J., Benson, N. C., Kay, K., & Winawer, J. (2019). Predicting neuronal dynamics with a delayed gain control model. PLOS Computational Biology, 15(11), e1007484. https://doi.org/10.1371/journal.pcbi.1007484

      (2) A second potential weakness is that, in Experiment 1, the authors briefly change the sensory stimulus at the end of the delay (a 'phase shift', Fig. 6A). I believe this is intended to act as a mask. However, I would expect that, in the DyNR model, this should be modeled as a new sensory input (in Experiment 2, 50 ms is plenty of time for the subjects to process the stimuli). One might expect this change to disrupt sensory and memory representations in a very characteristic manner. This seems to make a strong testable hypothesis. Did the authors find evidence for interference from the phase shift?

      The phase shift was implemented with the intention of reducing retinal after-effects, essentially acting as a mask for retinal information only; crucially the orientation of the stimulus is unchanged by the phase shift, so from the perspective of the DyNR model, it transmits the same orientation information to working memory as the original stimulus.

      If our objective were to model sensory input at the level of individual neurons and their receptive fields, we would indeed need to treat this phase shift as a novel input. Nevertheless, for DyNR, conceived as an idealization of a biological system for encoding orientation information, we can safely assume that visual areas in biological organisms have a sufficient number of phase-sensitive simple cells and phase-indifferent complex cells to maintain the continuity of input to VWM.

      When comparing conditions with and without the phase shift of stimuli (Fig S1B), we found performance to be comparable in the perceptual condition (simultaneous presentation) and with the longest delay (1 second), suggesting that the phase shift did not change the visibility or encoding of information into VWM. In contrast, we found strong evidence that observers had access to an additional source of information over intermediate delays when the phase shift was not used. This was evident through enhanced recall performance from 0 ms to 400 ms delay. Based on this, we concluded that the additional source of information available in the absence of a phase shift was accessible immediately following stimulus offset and had a brief duration, aligning with the theoretical concept of retinal afterimages.

      (3) It seems odd that the mask does not interrupt sensory processing in Experiment 2. Isn't this the intended purpose of the mask? Should readers interpret this as all masks not being effective in disrupting sensory processing/iconic memory? Or is this specific to the mask used in the experiment?

      Visual masks are often described as instantly and completely halting the visual processing of information that preceded the mask. We also anticipated the mask would entirely terminate sensory processing, but our data indicate the effect was not complete (as indicated by model variants in Experiment 2). Nevertheless, we believe we achieved our intended goal with this experiment – we observed a clear modulation of response errors with changing stimulus duration, indicating that the post-stimulus information that survived masking did not compromise the manipulation of stimulus duration. Moreover, the DyNR model successfully accounted for the portion of signal that survived the mask.

      We can identify two possible reasons why masking was incomplete. First, it is possible that the continuous report measure used in our experiments is more sensitive than the discrete measures (e.g., forced-choice methods) commonly employed in experiments that found masks to be 100% effective. Second, despite using a flickering white noise mask at full contrast, it is possible that it may not have been the most effective mask; for instance, a mask consisting of many randomly oriented Gabor patches matched in spatial frequency to the stimuli could prove more effective. We decided against such a mask because we were concerned that it could potentially act as a new input to orientation-sensitive neurons, rather than just wiping out any residual sensory activity.

      (4) I apologize if I missed it, but the authors did not compare the DyNR model to a model without decaying sensory information for Experiment 1.

      We tested two DyNR variants in which the diffusion process was solely responsible for memory fidelity dynamics. These models assumed that the sensory signal terminates abruptly with stimuli offset, and the VWM signal encoding the stimuli was equal to the limit imposed by normalization, independent of the delay duration.

      As variants of this model failed to account for the observed response errors both quantitatively (see 'Fixed neural signal' under Model variants) and qualitatively (Figure S3), we decided not to test any more restrictive variants, such as the one without sensory decay and diffusion.

      (5) In the current model, selection is considered to be absolute (all or none). However, this need not be the case (previous work argues for graded selection). Could a model where memories are only partially selected, in a manner that is mediated by load, explain the load effects seen in behavior?

      Thank you for this point. If attentional selection was partial, it would affect the observers’ efficiency in discarding uncued objects to release allocated resources and encode additional information about the cued item. We and others have previously examined whether humans can efficiently update their VWM when previous items become obsolete. For example, Taylor et al. (2023) showed that observers could efficiently remove uncued items from VWM and reallocate the released resources to new visual information. These findings align with results from other studies (e.g., Ecker, Oberauer, & Lewandowsky, 2014; Kessler & Meiran, 2006; Williams et al., 2013).

      Based on these findings, we feel justified in assuming that observers in our current task were capable of fully removing all uncued objects, allowing them to continue the encoding process for the cued orientation that was already partially stored in VWM, such that the attainable limit on representational precision for the cued item equals the maximum precision of VWM.

      Partial removal could in principle be modelled in the DyNR model by introducing an additional plateau parameter specifying a maximum attainable precision after the cue. Our concern would be that such a plateau parameter would trade off with the parameter associated with Hick’s law (i.e., cue interpretation time). The former would control the amount of information that can be encoded into VWM, while the latter regulates the amount of sensory information available for encoding. We are wary of adding additional parameters, and hence flexibility, to the model where we do not have the data to sufficiently constrain them.

      Ecker, U. K. H., Oberauer, K., & Lewandowsky, S. (2014b). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15. https://doi.org/10.1016/j.jml. 2014.03.006

      Kessler, Y., & Meiran, N. (2006). All updateable objects in working memory are updated whenever any of them are modified: Evidence from the memory updating paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 570–585. https://doi.org/10.1037/0278-7393.32.3.570

      Taylor, R., Tomić, I., Aagten-Murphy, D., & Bays, P. M. (2023). Working memory is updated by reallocation of resources from obsolete to new items. Attention, Perception, & Psychophysics, 85(5), 1437–1451. https://doi.org/10.3758/s13414-022-02584-2

      Williams, M., & Woodman, G. F. (2012). Directed forgetting and directed remembering in visual working memory. Journal of Experimental Psychology. Learning, Memory, and Cognition, 38(5), 1206–1220. https://doi.org/10.1037/a0027389

      (6) Previous work, both from the authors and others, has shown that memories are biased as if they are acted on by attractive/repulsive forces. For example, the memory of an oriented bar is biased away from horizontal and vertical and biased towards diagonals. This is not accounted for in the current model. In particular, this could be one mechanism to generate a non-uniform drift rate over time. As noted in the paper, a non-uniform drift rate could capture many of the behavioral effects reported.

      The reviewer is correct that the model does not currently include stimulus-specific effects, although our work on that topic provides a clear template for incorporating them in future (e.g. Taylor & Bays, 2018). Specifically on the question of generating a non-uniform drift, we have another project that currently looks at this exact question (cited in our manuscript as Tomic, Girones, Lengyel, and Bays; in prep.). By examining various datasets with varying memory delays, including the Additional Dataset 1 reported in the Supplementary Information, we found that stimulus-specific effects on orientation recall remain constant with retention time. Specifically, although there is a clear increase in overall error over time, estimation biases remain constant in direction and amplitude, indicating that the bias does not manifest in drift rates (see also Rademaker et al., 2018; Figure S1).

      Taylor, R., & Bays, P. M. (2018). Efficient coding in visual working memory accounts for stimulus-specific variations in recall. The Journal of Neuroscience, 1018–18. https://doi.org/10.1523/JNEUROSCI.1018-18.2018

      Rademaker, R. L., Park, Y. E., Sack, A. T., & Tong, F. (2018). Evidence of gradual loss of precision for simple features and complex objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance. https://doi.org/10.1037/xhp0000491

      (7) Finally, the authors use AIC to compare many different model variants to the DyNR model. The delta-AICs are high (>10), indicating a strong preference for the DyNR model over the variants. However, the overall quality of fit to the data is not clear. What proportion of the variance in data was the model able to explain? In particular, I think it would be helpful for the reader if the authors reported the variance explained on withheld data (trials, conditions, or subjects).

      Thank you for this comment.

      Below we report the estimates of r2, representing the goodness of fit between observed data (i.e., RMSE) and the DyNR model predictions.

      In Experiment 1, the r2 values between observations and predictions were computed across delays for each set size, yielding the following estimates: r2ss1 = 0.60; r2ss4 = 0.87; r2ss10 = 0.95. Note that lower explained variance for set size 1 arises from both data and model predictions having near-constant precision.

      In Experiment 2, we calculated r2 between observations and predictions across presentation durations, separately for each set size, resulting in the following estimates: r2ss1 = 0.88; r2ss4 = 0.71; r2ss10 = 0.70. Note that in this case the decreasing percentage of explained variance with set size is a consequence of having less variability in both data and model predictions with larger set sizes.

      While these estimates suggest that the DyNR model effectively fits the psychophysical data, a more rigorous validation approach would involve cross-validation checks across all conditions with a withheld portion of trials. Regrettably, due to the large number of conditions in each experiment, we could only collect 50 trials per condition. We are sceptical that fitting the model to even fewer trials, as necessary for cross-validation, would provide a reliable assessment of model performance.

      Minor: It isn't clear to me why the behavioral tasks are shown in Figure 6. They are important for understanding the results and are discussed earlier in the manuscript (before Figure 3). This just required flipping back and forth to understand the task before I could interpret the results.

      Thank you for this comment. We have now moved the behavioural task figure to appear early in the manuscript (as Figure 3).

      Reviewer #3 (Recommendations For The Authors):

      (1) Dynamics of sensory signals during perception

      I believe that the modeled sensory signal is a reasonable simplification and different ways to model the decay function are discussed. I would like to ask the authors to discuss the implications of slightly more complex initial sensory transients such as the ones shown in Teeuwen (2021). Specifically for short exposure times, this might be particularly relevant for the model fits as some of the alternative models diverge from the data for short exposures. In addition, the role of feedforward (initial transient?) and feedback signaling (subsequent "plateau" activity) could be discussed. The first one might relate more strongly to sensory signals whereas the latter relates more to top-down attention/recurrent processing/VWM.

      Particularly, this latter response might also be sensitive to the number of items present on the screen which leads to a related question pertaining to the limitations of attention during perception. Some work suggests that perception is similarly limited in the amount of information that can be represented concurrently (Tsubomi, 2013). Could the authors discuss the implications of this hypothesis? What happens if maximum sensory amplitude is set as a free parameter in the model?

      Tsubomi, H., Fukuda, K., Watanabe, K., & Vogel, E. K. (2013). Neural limits to representing objects still within view. Journal of Neuroscience, 33(19), 8257-8263.

      Thank you for this question. Below, we unpack it and answer it point by point.

      While we agree our model of the sensory response is justified as an idealization of the biological reality, we also recognise that recent electrophysiological recordings have illuminated intricacies of neuronal responses within the striate cortex, a critical neural region associated with sensory memory (Teeuwen et al, 2021). Notably, these recordings reveal a more nuanced pattern where neurons exhibit an initial burst of activity succeeded by a lower plateau in firing rate, and stimulus offset elicits a second small burst in the response of some neurons, followed by a gradual decrease in activity after the stimulus disappears (Teeuwen et al, 2021).

      In general, asynchronous bursts of activity in individual neurons will tend to average out in the population making little difference to predictions of the DyNR model. Synchronized bursts at stimulus onset could affect predictions for the shortest presentations in Exp 2, however the model appears to capture the data very well without including them. We would be wary of incorporating these phenomena into the model without more clarity on their universality (e.g., how stimulus-dependent they are), their significance at the population level (as opposed to individual neurons), and most importantly, their prominence in visual areas outside striate cortex. Specifically, while Teeuwen et al. (2021) described activity in V1, our model does not make strong assumptions about which visual areas are the source of the sensory input to WM. Based on these uncertainties we believe the idealized sensory response is justified for use in our model.

      Next, thank you for the comment on feedforward and feedback signals. We have added the following to our manuscript:

      “Following onset of a stimulus, the visual signal ascends through visual areas via a cascade of feedforward connections. This feedforward sweep conveys sensory information that persists during stimulus presentation and briefly after it disappears (Lamme et al., 1998). Simultaneously, reciprocal feedback connections carry higher-order information back towards antecedent cortical areas (Lamme and Roelfsema, 2000). In our psychophysical task, feedback connections likely play a critical role in orienting attention towards the cued item, facilitating the extraction of persisting sensory signals, and potentially signalling continuous information on the available resources for VWM encoding. While our computational study does not address the nature of these feedforward and feedback signals, a challenge for future research is to describe the relative contributions of these signals in mediating transmission of information between sensory and working memory (Semedo et al., 2022).”

      Lamme, V. A., Supèr, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8(4), 529–535. https://doi.org/10.1016/S0959-4388(98)80042-1

      Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23(11), 571–579. https://doi.org/10.1016/S0166-2236(00)01657-X

      Semedo, J. D., Jasper, A. I., Zandvakili, A., Krishna, A., Aschner, A., Machens, C. K., Kohn, A., & Yu, B. M. (2022). Feedforward and feedback interactions between visual cortical areas use different population activity patterns. Nature Communications, 13(1), 1099. https://doi.org/10.1038/s41467-022-28552-w

      Finally, both you and Reviewer 2 raised a similar interesting question regarding capacity limitations of attention during perception Such a limitation could be modelled by freely estimating sensory amplitude and implementing divisive normalization to that signal, similar to how VWM is constrained. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      (2) Effectivity of retro-cues at long delays

      Can the authors discuss how cues presented at long delays (>1000 ms) can still lead to increased memory fidelity when sensory signals are likely to have decayed? A list of experimental work demonstrating this can be found in Souza & Oberauer (2016).

      Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78, 1839-1860.

      The increased memory fidelity observed with longer delays between memory array offset and cue does not result from integrating available sensory signals into VWM because the sensory signal would have completely decayed by that time. Instead, research so far has indicated several alternative mechanisms that could lead to higher recall precision for cued items, and we can briefly summarize some of them, which are also reviewed in more detail in Souza and Oberauer (2016).

      One possibility is that, after a highly predictive retro-cue indicates the to-be-tested item, uncued items can simply be removed from VWM. This could result in decreased interference for the cued item, and consequently higher recall precision. Secondly, the retro-cue could also indicate which item can be selectively attended to, and thereby differentially strengthening it in memory. Furthermore, the retro-cue could allow evidence to accumulate for the target item ahead of decision-making, and this could increase the probability that the correct information will be selected for response. Finally, the retro-cued stimulus could be insulated from interference by subsequent visual input, while the uncued stimuli may remain prone to such interference.

      A neural account of this retro-cue effect based on the original neural resource model has been proposed in Bays & Taylor, Cog Psych, 2018. However, as we did not use a retro-cue design in the present experiments, we have decided not to elaborate on this in the manuscript.

      (3) Swap errors

      I am somewhat surprised by the empirically observed and predicted pattern of swap errors displayed in Figure S2. For set size 10, swap probability does not consistently increase with the duration of the retention interval, although this was predicted by the author's model. At long intervals, swap probability is significantly higher for large compared to small set sizes, which also seems to contrast with the idea of shared, limited VWM resources. Can the authors provide some insight into why the model fails to reproduce part of the behavioral pattern for swap errors? The sentence in line 602 might also need some reconsideration in this regard.

      Determining the ground truth for swap errors poses a challenge. The prevailing approach has been to employ a simpler model that estimates swap errors, such as a three-component mixture model, and use those estimates as a proxy for ground truth. However, this method is not without its shortcomings. For example, the variability of swap frequency estimates tends to increase with variability in the report feature dimension (here, orientation). This is due to the increasing overlap of response probability distributions for swap and non-swap responses. Consequently, the discrepancy between any two methods of swap estimation is most noticeable when there is substantial variability in orientation reports (e.g., 10 items and long delay or short exposure).

      When modelling swap frequency in the DyNR model, our aim was to provide a parsimonious account of swap errors while implementing similar dynamics in the spatial (cue) feature as in the orientation (report) feature. This parametric description captured the overall pattern of swap frequency with set size and retention and encoding time, but is still only an approximation of the predictions if we fully modelled memory for the conjunction of cue and report features (as in e.g. Schneegans & Bays, 2017; McMaster et al, 2020).

      We expanded the existing text in the section ‘Representational dynamics of cue-dimension features’ of our manuscript:

      “… Although we did not explicitly model the neural signals representing location, the modelled dynamics in the probability of swap errors were consistent with those of the primary memory feature. We provided a more detailed neural account of swap errors in our earlier works that is theoretically compatible with the DyNR model (McMaster et al., 2020; Schneegans & Bays, 2017).

      The DyNR model successfully captured the observed pattern of swap frequencies (intrusion errors). The only notable discrepancy between DyNR and the three-component mixture model (Fig. S2) arises with the largest set size and longest delay, although with considerable interindividual variability. As the variability in report-dimension increases, the estimates of swap frequency become more variable due to the growing overlap between the probability distributions of swap and non-swap responses. This may explain apparent deviations from the modelled swap frequencies with the highest set size and longest delay where orientation response variability was greatest. “

      McMaster, J. M. V., Tomić, I., Schneegans, S., & Bays, P. M. (2022). Swap errors in visual working memory are fully explained by cue-feature variability. Cognitive Psychology, 137, 101493. https://doi.org/10.1016/j.cogpsych.2022.101493

      Schneegans, S., & Bays, P. M. (2017). Neural Architecture for Feature Binding in Visual Working Memory. The Journal of Neuroscience, 37(14), 3913–3925. https://doi.org/10.1523/JNEUROSCI.3493-16.2017

      (4) Direct sensory readout

      The model assumes that readout from sensory memory and from VWM happens with identical efficiency. Currently, we don't know if these two systems are highly overlapping or are fundamentally different in terms of architecture and computation. In the case of the latter, it might be less reasonable to assume that information readout would happen at similar efficiencies, as it is currently assumed in the manuscript. Perhaps the authors could briefly discuss this possibility.

      In the direct sensory read-out model, we did not explicitly model the efficiency of readout from either sensory or VWM store. However, the distinctive prediction of this model is that the precision of recall changes exponentially with delay at every set size, including one item. This prediction does not depend on the relative efficiency of readout from sensory and working memory, but only on the principle that direct readout from sensory memory bypasses the capacity limit on working memory. This prediction is inconsistent with the pattern of results observed in Experiment 1, where early cues did not show a beneficial effect on recall error for set size 1. While the proposal raised by the reviewer is intriguing, even if we were to model the process of readout from both the sensory and VWM stores with different efficiencies, the direct read-out model could not account for the near-constant recall error with delay for set size one.

      (5) Encoding of distractors

      One of the model assumptions is that, for simultaneous presentations of memory array and cue only the cued feature will be encoded. Previous work has suggested that participants often accidentally encode distractors even when they are cued before memory array onset (Vogel 2005). Given these findings, how reasonable is this assumption in the authors' model?

      Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal individual differences in controlling access to working memory. Nature, 438(7067), 500-503.

      Although previous research suggested that observers can misinterpret the pre-cue and encode one of the uncued items, our results argue against this being the case in the current experiment. Such encoding failures would manifest in overall recall error, resulting in a gradient of error with set size, owing to the presence of more adjacent distractors in larger set sizes. However, when we compared recall errors between set sizes in the simultaneous cue condition, we did not find a significant difference between set sizes, and moreover, our results were more likely under the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). If observers occasionally encoded and reported one of the uncued items in the simultaneous cue condition, those errors were extremely infrequent and did not affect the overall error distributions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al., investigated the relationship between monocular and binocular responses of V1 superficial-layer neurons using two-photon calcium imaging. They found a strong relationship in their data: neurons that exhibited a greater preference for one eye or the other (high ocular dominance) were more likely to be suppressed under binocular stimulation, whereas neurons that are more equivalently driven by each other (low ocular dominance) were more likely to be enhanced by binocular stimulation. This result chiefly demonstrates the relationship between ocular dominance and binocular responses in V1, corroborating what has been shown previously using electrophysiological techniques but now with greater spatial resolution (albeit less temporal resolution). The binocular responses were well-fitted by a model that institutes divisive normalization between the eyes that accounts for both the suppression and enhancement phenomena observed in the subpopulation of binocular neurons. In so doing, the authors reify the importance of incorporating ocular dominance in computational models of binocular combination.

      The conclusions of this paper are mostly well supported by the data, but there are some limitations of the methodology that need to be clarified, and an expansion of how the results relate to previous work would better contextualize these important findings in the literature.

      Strengths:

      The two-photon imaging technique used to resolve the activity of individual neurons within intact brain tissue grants a host of advantages. Foremost, two-photon imaging confers considerably high spatial resolution. As a result, the authors were able to sample and analyze the activity from thousands of verified superficial-layer V1 neurons. The animal model used, awake macaques, is also highly relevant for the study of binocular combination. Macaques, like humans, are binocular animals, meaning they have forward-facing eyes that confer overlapping visual fields. Importantly, macaque V1 is organized into cortical columns that process specific visual features from the separate eyes just like in humans. In combination with a powerful imaging technique, this allowed the authors to evaluate the monocular and binocular response profiles of V1 neurons that are situated within neighboring ocular dominance columns, a novel feat. To this aim, the approach was well-executed and should instill further confidence in the notion that V1 neurons combine monocular information in a manner that is dependent on the strength of their ocular dominance.

      Weaknesses:

      While two-photon imaging provides excellent spatial resolution, its temporal resolution is often lower compared to some other techniques, such as electrophysiology. This limits the ability to study the fast dynamics of neuronal activity, a well-understood trade-off of the method. The issue is more so that the authors draw comparisons to electrophysiological studies without explicit appreciation of the temporal difference between these techniques. In a similar vein, two-photon imaging is limited spatially in terms of cortical depth, preferentially sampling from neurons in layers 2/3. This limitation does not invalidate any of the interpretations but should be considered by readers, especially when making comparisons to previous electrophysiological reports using microelectrode linear arrays that sample from all cortical layers. Indeed, it is likely that a complete picture of early cortical binocular processing will require high spatial resolution (i.e., sampling from neurons in neighboring ocular dominance columns, from pia mater to white matter) at the biophysically relevant timescales (1ms resolution, capturing response dynamics over the full duration of the stimulus presentation, including the transient onset and steady-state periods).

      To address the same concern from all three reviewers, we discussed the technical limitations of two photon calcium imaging at the end of Discussion, including limited imaging depth, low temporal resolution, and nonlinearity. The relevant texts are copied here:

      (Ln 304) “Limitations of the current study

      Although capable of sampling a large number of neurons at cellular resolution and with low sampling bias, two-photon calcium imaging has its known limitations that may better make it a complementary research tool to electrophysiological recordings.

      For example, two-photon imaging can only sample neurons from superficial-layers, while binocular neurons also exist in deeper layers, and even neurons in the input layer are affected by feedback from downstream binocular neurons to exhibit binocular response properties (Dougherty, Cox, Westerberg, & Maier, 2019). Furthermore, calcium signals are relatively slow and cannot reveal the fast dynamics of neuronal responses. Due to these spatial and temporal limitations, a more complete picture of the neuronal mechanisms underlying binocular combination of monocular responses may come from studies using both technologies.

      In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although calcium signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rates within a range of 10-150 Hz (Li, Liu, Jiang, Lee, & Tang, 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the differences in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      (Recommendations For The Authors):

      Overall, my main suggestion for the authors to improve the paper is to revise some of the interpretations of their results in relation to previous research. The purpose of the present study was to illustrate a more complete picture of the binocular combination of monocular responses by taking into consideration the ocular dominance of V1 cells (lines 34-36). A study published earlier this year had an identical purpose (Mitchell et al., Current Biology, 2023) and arrived at a highly similar conclusion (and also applied divisive normalization to fit their data). I would ask that this paper be mentioned in the introduction and discussed.

      The Mitchell et al 2023 paper is added to the Introduction and Discussion:

      (Ln 50) “In addition (to the Dougherty et al 2019 paper from the same group), Mitchell, Carlson, Westerberg, Cox, and Maier (2023) reported that binocular combination of monocular stimuli with different contrasts is also affected by neurons’ eye preference.”

      (Ln 286) “The critical roles of ocular dominance have been largely overlooked by extant binocular vision models to our knowledge, except that Anderson and Movshon (1989) demonstrated that a model consisting of multiple ocular dominance channels can better explain their psychophysical adaptation data, and that Mitchell et al. (2023) revealed that binocular combination of different contrasts presented to different eyes are affected by neurons’ ocularity preference.”

      Nevertheless, the results of the present study are very valuable. They add substantial spatial resolution and sophisticated relational analysis of monocular and binocular responses that Mitchell et al., 2023 did not include. Therefore, my suggestion is to emphasize the advantages of two-photon imaging in the introduction, focusing on the ability to image neurons in neighboring ocular dominance columns. The rigorous modeling of the relationship between nearby neurons with a range of eye preferences, in tandem with the incredible yield of two-photon imaging, is what sets this paper apart from previous electrophysiological work.

      The finding that binocular responses were dependent on ocular dominance is largely consistent with previous electrophysiological results. However, there should be a paragraph in the discussion section that speaks to the limitations of comparing two-photon imaging data to electrophysiological data. Namely, there are two limitations:

      (1) These two techniques confer different temporal resolutions. It is conceivable that some of the electrophysiology relationships (for example, described by Dougherty et al., 2019) may be dependent on the temporal window over which the data was averaged, typically over 50-100ms around stimulus onset, or 100-250ms comprising the neurons' sustained response to the stimulus. This possible explanation of the difference in obtained results would be especially useful for the discussion paragraph starting at line 232. It would also be helpful to readers for there to be some mention of the advantage of having high temporal resolution (i.e., the benefits of electrophysiology) since (a) recent work has distinguished between sequential stages of binocular combination (Cox et al., 2019) and (b) modern models of V1 neurons emphasize recurrent feedback to explain V1 temporal dynamics (see Heeger et al., 2019; Rubin et al., 2015), which could prove to be relevant for combination of stimuli in the two eyes (Fleet et al., 1997).

      Our discussion regarding the technical limitations of 2-p calcium imaging has been listed earlier. Specific to the Dougherty et 2019 paper, we added the following discussion to address the issue of temporal resolution difference between two technologies.

      (Ln 266) “In addition, it is unclear whether the discrepancies are caused by different temporal resolutions of electrode recording and calcium imaging. The results of Dougherty et al. (2019) represent changes of neuronal spike activities over a period of approximately 50-200 ms after the stimulus onset, which may reflect the sustained neuronal responses to the stimulus and possible feedback signals. Calcium signals are much slower and indicative of the aggregated neuronal responses over a longer period (up to 1000 ms in the current study). They should have smeared, rather than exaggerated, the differences between monocular and binocular responses, although we cannot exclude the possibility that some neuronal response changes beyond 200 ms are responsible for the discrepancies.”

      (2) The sample of V1 neurons in this study is limited to cells in the most superficial layers of the cortex (layers 2/3). This limitation is, of course, well understood, but it should be mentioned at least in the context of studying the formative mechanisms of binocular combination in V1 (since we know that binocular neurons also exist in layers 5/6, and there is now substantial evidence that even layer 4 neurons are not as "monocular" as we previously thought (Dougherty et al., 2019)).

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      In short, I believe the paper would be improved by (1) adding the above citations in the appropriate places, (2) acknowledging in the introduction that this question has been investigated electrophysiologically but emphasizing the advantages of two-photon imaging, and (3) adding a paragraph to the discussion section that discusses the temporal and spatial limitations when using two-photon imaging to study binocular combination, particularly when comparing the results to electrophysiology.

      Reviewer #2 (Public Review):

      Summary:

      This study examines the pattern of responses produced by the combination of left-eye and right-eye signals in V1. For this, they used calcium imaging of neurons in V1 of awake, fixating monkeys. They take advantage of calcium imaging, which yields large populations of neurons in each field of view. With their data set, they observe how response magnitude relates to ocular dominance across the entire population. They analyze carefully how the relationship changed as the visual stimulus switched from contra-eye only, ipsi-eye only, and binocular. As expected, the contra-eye-dominated neurons responded strongly with a contra-eye-only stimulus. The ipsi-eye-dominated neurons responded strongly with an ipsi-eye-only stimulus. The surprise was responses to a binocular stimulus. The responses were similarly weak across the entire population, regardless of each neuron's ocular dominance. They conclude that this pattern of responses could be explained by interocular divisive normalization, followed by binocular summation.

      Strengths:

      A major strength of this work is that the model-fitting was done on a large population of simultaneously recorded neurons. This approach is an advancement over previous work, which did model-fitting on individual neurons. The fitted model in the manuscript represents the pattern observed across the large population in V1, and washes out any particular property of individual neurons. Given the large neuronal population from which the conclusion was drawn, the authors provide solid evidence supporting their conclusion. They also observed consistency across 5 fields of view.

      The experiments were designed and executed appropriately to test their hypothesis. Their data support their conclusion.

      Weaknesses:

      One weakness of their study is that calcium signals can exaggerate the nonlinear properties of neurons. Calcium imaging renders poor responses poorer and strong responses stronger, compared to single-unit recording. In particular, the dramatic change in the population response between monocular stimulation and binocular stimulation could actually be less pronounced when measured with single-unit recording methods. This means their choice of recording method could have accidentally exaggerated the evidence of their finding.

      We discussed the nonlinearity of calcium signals as part of the technical limitations of 2-p imaging calcium. The calcium indicator we use, GCaMP5, has a reasonable range of linear relationship with spike rates. But out of this range, the nonlinearity is indeed a concern.

      (Ln 314) “In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rate within a range of 10-150 Hz (Li et al., 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the changes in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      The implication of their finding is that strong ocular dominance is the result of release from interocular suppression by a monocular stimulus, rather than the lack of binocular combination as many traditional studies have assumed. This could significantly advance our understanding of the binocular combination circuitry of V1. The entire population of neurons could be part of a binocular combination circuitry present in V1.

      This is a very good insight. We added the following sentences to the end of the first paragraph of Discussion:

      (Ln 242) “These findings implicate that at least for neurons in superficial layers of V1, significant ocular dominance may result from a release of interocular suppression during monocular stimulation, an unusual viewing condition as our vision is typically binocular, rather than a lack of binocular combination of inputs from upstream monocular neurons.”

      (Recommendations For The Authors):

      Line 150: "To model interocular response suppression, responses from each eye in Eq. 2 were further normalized by an interocular suppression factor wib or wcb," I recommend the authors improve their explanation of how they arrived at Eq. 3 from Eq. 2. As it stands, my impression is that they have one model for the responses to monocular stimulation, and another model for the responses to binocular stimulation. What I think is missing is that both equations are derived from the same model. Monocular stimulation is a situation in which the stimulus in one eye's contrast is zero. Could the authors clarify whether this situation produces an interocular suppression of zero, and how that leads to Eq. 2?

      We rewrote the modeling part to show that Equations 1-3 are sequential steps of development for the same model. We also added a brief paragraph to discuss how Eq. 3 could lead to Eq. 2 under monocular viewing:

      (Ln 166) “Although not shown in Eq. 3, we also assumed that the nonlinear exponent b also depends on the contrast of the stimulus presented to the other eye (i.e., Sc or Si). Consequently, when Sc or Si = 0 under monocular stimulation, Rc or Ri = 0 (Eq. 1), and interocular suppression wib or wcb = 1, so Eq. 3 changes back to Eq. 2. It is only when Sc and Si are equal and close to 1, as in the current study, that interocular suppression and binocular combination would be in the current Eq. 3 format.”

      Line 225: "However, individually, compared to monocular responses, responses of monocular neurons more preferring the stimulated eye are actually suppressed, and only responses of binocular neurons are increased by binocular stimulation." This sentence is difficult to follow. I recommend the authors improve clarity by breaking up the sentence into several sentences. If I understand correctly, they summarize the pattern in the data that is indicative of interocular divisive normalization, i.e., their final conclusion.

      This sentence no longer exists in the Discussion.

      Line 426: "Third, for those showing significant orientation difference, the trial-based orientation responses of each neuron were fitted with a Gaussian model with a MATLAB nonlinear least squares function:" The choice of using a Gaussian function to fit orientation tuning was probably suboptimal. A Gaussian function provides an adequate fit only for neurons whose tuning is very sharp. The responses outside of the peak fall down to the baseline and the two ends meet. Otherwise, the two ends do not meet. An adequate fit would be achieved with a function of a circular variable, which wraps around 180 deg. I recommend using a Von Mises function for fitting orientation tuning.

      We agree with the reviewer that the Von Mises function is more accurate than Gaussian for fitting orientation tuning functions. Indeed we are using it to fit orientation tuning of V4 neurons, many of which have two peaks. For the current V1 data, the differences between Von Mises and Gaussian fittings are very small, as shown in the orientation functional maps from three macaques below. Because we also use the same Gaussian fitting of orientation tuning in several published and current under-review papers, we prefer to keep the Gaussian fitting results in the manuscript.

      Author response image 1.

      Reviewer #3 (Public Review):

      The authors have made simultaneous recordings of the responses of large numbers of neurons from the primary visual cortex using optical two-photon imaging of calcium signals from the superficial layers of the cortex. Recordings were made to compare the responses of the cortical neurons under normal binocular viewing of a flat screen with both eyes open and monocular viewing of the same screen with one eye's view blocked by a translucent filter. The screen displayed visual stimuli comprising small contrast patches of Gabor function distributions of luminance, a stimulus that is known to excite cortical neurons.

      This is an important data set, given the large numbers of neurons recorded. The authors present a simple model to explain the binocular combination of neuronal signals from the right and left eyes.

      The limitations of the paper as written are as follows. These points can be addressed with some additional analysis and rewriting of sections of the paper. No new experimental data need to be collected.

      (1) The authors should acknowledge the fact that these recordings arise from neurons in the superficial layers of the cortex. This limitation arises from the usual constraints on optical imaging in the macaque cortex. This means that the sample of neurons forming this data set is not fully representative of the population of binocular neurons within the visual cortex. This limitation is important in comparing the outcome of these experiments with the results from other studies of binocular combination, which have used single-electrode recording. Electrode recording will result in a sample of neurons that is drawn from many layers of the cortex, rather than just the superficial layers.

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      (2) Single-neuron recording of binocular neurons in the primary visual cortex has shown that these neurons often have some spontaneous activity. Assessment of this spontaneous level of firing is important for accurate model fitting [1]. The paper here should discuss the level of spontaneous neuronal firing and its potential significance.

      We have noticed previously that at non-optimal spatial frequencies, calcium responses to a moving Gabor grating are close to zero (Guan et al., Prog Neurobiology, 2021, Fig. 1B), but we cannot tell whether this is due to calcium response nonlinearity, or a close-to-zero level of spontaneous neuronal activity. Prince et al (2002) reported low spontaneous responses of V1 neurons with moving grating stimuli (e.g., about 3 spikes/sec in one exemplar neuron, their Fig. 1B), so this appears not a big effect. In our data fitting, we do have an orientation-unspecific component in the Gaussian model, which represents the neuronal response at a non-preferred orientation, but not necessarily the spontaneous activity.

      (3) The arrangements for visual stimulation and comparison of binocular and monocular responses mean that the stereoscopic disparity of the binocular stimuli is always at zero or close to zero. The animal's fixation point is in the centre of a single display that is viewed binocularly. The fixation point is, by definition, at zero disparity. The other points on the flat display are also at zero disparity or very close to zero because they lie in the same depth plane. There will be some small deviations from exactly zero because the geometry of the viewing arrangements results in the extremities of the display being at a slightly different distance than the centre. Therefore, the visual stimulation used to test the binocular condition is always at zero disparity, with a slight deviation from zero at the edges of the display, and never changes. [There is a detail that can be ignored. The experimenters tested neurons with visual stimulation at different real distances from the eyes, but this is not relevant here. Provided the animals accurately converged their eyes on the provided binocular fixation point, then the disparity of the visual stimuli will always be at or close to zero, regardless of viewing distance in these circumstances.] However, we already know from earlier work that neurons in the visual cortex exhibit a range of selectivity for binocular disparity. Some neurons have their peak response at non-zero disparities, representing binocular depths nearer than the fixation depth or beyond it. The response of other neurons is maximally suppressed by disparities at the depth of the fixation point (so-called Tuned Inhibitory [TI] neurons). The simple model and analysis presented in the paper for the summation of monocular responses to predict binocular responses will perform adequately for neurons that are tuned to zero disparity, so-called tuned excitatory neurons [TE], but is necessarily compromised when applied to neurons that have other, different tuning profiles. Specifically, when neurons are stimulated binocularly with a non-preferred disparity, the binocular response may be lower than the monocular response[2, 3]. This more realistic view of binocular responses needs to be considered by the authors and integrated into their modelling.

      We agree and include the following texts when discussing the future work:

      (Ln 298) “In addition, in our experiments, binocular stimuli were presented with zero disparity, which best triggered the responses of neurons with zero-disparity tuning. A more realistic model of binocular combination also requires the consideration of neurons with other disparity-tuning profiles.”

      (4) The data in the paper show some features that have been reported before but are not captured by the model. Notably for neurons with extreme values of ocular dominance, the binocular response is typically less than the larger of the two monocular responses. This is apparent in the row of plots in Figure 2D from individual animals and in the pooled data in Figure 2E. Responses of this type are characteristic of tuned inhibitory [TI] neurons[2]. It is not immediately clear why this feature of the data does not appear in the summary and analysis in Figure 3.

      This difference is indeed captured by the model, which can be more easily appreciated in Fig. 4A where monocular and binocular model simulations are plotted in the same panel. In the text, we also wrote: (Ln 195) “It is apparent that binocular responses cannot be explained by the sum of monocular responses, as binocular responses are substantially lower than the summed monocular responses for both monocular and binocular neurons. Nor can binocular responses be explained by the responses to the preferred eye, as binocular responses are also lower than those to the preferred eye (the larger of the two monocular responses) for monocular neurons.”

      The paper text states that the responses were "first normalized by the median of the binocular responses". This will certainly get rid of this characteristic of the data, but this step needs better justification, or an amendment to the main analysis is needed.

      The relevant sentence has been rewritten as “Monocular and binocular data of each FOV/depth, as well as the pooled data, were first normalized by the respective median of the binocular responses of all neurons in the same FOV/depth.” This normalization would render the overall binocular responses to be around unity, for the purpose of facilitating comparisons among all FOV/depth, but it would not affect the overall characteristic of the data.

      In the present form, the model and analysis do not appear to fit the data in Figure 2 as accurately as needed.

      Thanks for pointing out the problem, as data fitting for FOV C_270 and the pooled data were especially inaccurate. The issue has been mostly fixed when each datum was weighted by its standard deviation (please see the updated Fig. 3).

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zeng and Staley provide a valuable analysis of the molecular requirements for the export of a reporter mRNA that contains a lariat structure at its 5' end in the budding yeast S. cerevisiae. The authors provide evidence that this is regulated by the main mRNA export machinery (Yra1, Mex67, Nab2, Npl3, Tom1, and Mlp1). Of note, Mlp1 has been mainly implicated in the nuclear retention of unspliced pre-mRNA (i.e. quality control), and relatively little has been done to investigate its role in mRNA export in budding yeast.

      Strengths:

      There is relatively little information in the current literature about the nuclear export of splicing intermediates. This paper provides one of the first analyses of this process and dissects the molecular components that promote this form of RNA export. Overall, the strength of the data presented in the manuscript is solid. The paper is well written and the message is clear and of general interest to the mRNA community.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      There are three problems with the paper, although these are not major and likely would not affect the final model as most aspects of the molecular details are confirmed by multiple complementary assays.

      (1) The brG reporter produces both unspliced pre-mRNA and a lariat-containing intermediate RNA. Based on the primer extension assay the authors claim that only 33% of the final product is in pre-mRNA form and that this "is insufficient to account for the magnitude of the cytoplasmic signal from the brG reporter (83%)". Nevertheless, it is possible that primer extension is incomplete or that the lariat-containing RNA is inaccessible for smFISH. The authors could easily perform a dual smFISH experiment (similar to Adivarahan et l., Molecular Cell 2018) where exon 1 is labelled with probes of one color, and the region that overlaps the lariat-containing intermediate is labelled with probes of a second color. If the authors are correct, then one-third of the smFISH foci should have both labels and the rest would have only the second label. This would also confirm that the latter (i.e. the lariat-containing RNAs) are exported to the cytoplasm. Using this approach, the authors could then show that MLP1-depletion (or depletion of any of the other factors) affect(s) one pool of RNAs (i.e. those that are lariat-containing) but not the other (i.e. pre-mRNA). Including these experiments would make the evidence for their model more convincing.

      We appreciate the reviewer’s comments and suggestions. Concerning the primer extension analysis, we are considering alternative assays to quantitate the pre-mRNA and lariat intermediate levels. Concerning the accessibility of the lariat intermediate in smRNA-FISH, in a dbr1∆ strain the only major species from the UAc reporter that is detected by primer extension is the lariat intermediate (Fig. S3), and this reporter is readily detected by smRNA-FISH, indicate that the lariat intermediate is accessible to smRNA-FISH. Concerning discriminating between pre-mRNA and lariat intermediate by smRNA-FISH, we agree with the reviewer that a dual smFISH experiment would directly distinguish between the signals of these species. The brG reporter we used in most smRNA-FISH experiments has a 5’ exon that is too short for smRNA-FISH probes, as is typical of most budding yeast 5’ exons. We have tried to replace the 5’ exon with a longer sequence (GFP) to allow for smRNA-FISH; however, this substitution inhibited splicing. Therefore, to distinguish signals from pre-mRNA versus lariat intermediate, we used additional reporters: G1c and brC reporters, which accumulate pre-mRNA essentially exclusively (Fig. S2A-C), and the UAc reporter, which accumulates lariat intermediate exclusively, in a dbr1∆ strain (Fig. S3). Whereas the mlp1 deletion did not change beta-galactosidase activities of the G1c and brC pre-mRNA-accumulating reporters (Fig. S2E), the mlp1 deletion in a dbr1∆ background did reduce the beta-galactosidase activities of the UAc lariat intermediate-accumulating reporter (Fig. 3D) and did increase smRNA-FISH signal of this reporter in the nucleus (Fig. 3E). These observations corroborate our interpretation based on the brG reporter that Mlp1p is required for efficient export of lariat intermediates but not pre-mRNAs.

      (2) In some cases, the number of smFISH foci appears to change drastically depending on the genetic background. This could either be due to the stochastic nature of mRNA expression between cells or reflect real differences between the genetic backgrounds that could alter the interpretation of the other observations.

      We thank the reviewer for raising this point. We will review our data to distinguish between these possibilities.

      (3) The authors state in the discussion that "the general mRNA export pathway transports discarded lariat intermediates into the cytoplasm". Although this appears to be the case for the reporters that are investigated in this paper, I don't think that the authors should make such a broad sweeping claim. It may be that some discarded lariat intermediates are exported to the cytoplasm while others are targeted for nuclear retention and/or decay.

      The reviewer’s point is well-taken. We will revise the wording accordingly.

      Reviewer #2 (Public Review):

      In this report, Zeng and Staley have used an elegant combination of RNA imaging approaches (single molecule FISH), RNA co-immunoprecipitations, and translation reporters to characterize the factors and pathways involved in the nuclear export of splicing intermediates in budding yeast. Their study notably involves the use of specific reporter genes, which lead to the accumulation of pre-mRNA and lariat species, in a battery of mutants impacting mRNA export and quality control.

      The authors convincingly demonstrate that mRNA species expressed from such reporters are exported to the cytoplasm in a manner depending on the canonical mRNA export machinery (Mex67 and its adaptors) and the nuclear pore complex (NPC) basket (Mlp1). Interestingly, they provide evidence that the export of splicing intermediates requires docking and subsequent undocking at the nuclear basket, a step possibly more critical than for regular mRNAs.

      We thank the reviewer for this overall positive assessment.

      However, their assays do not always allow us to define whether the impacted mRNA species correspond to lariats and/or pre-mRNAs. This is all the more critical since their findings apparently contradict previous reports that supported a role for the nuclear basket in pre-mRNA quality control. These earlier studies, which were similarly based on the use of dedicated yet distinct reporters, had found that the nuclear basket subunit Mlp1, together with different cofactors, prevents the export of unspliced mRNA species. It would be important to clarify experimentally and discuss the possible reasons for these discrepancies.

      It is true that we did not assess export of all reporters in all mutant strains by smFISH; however, we did validate the key conclusion that the export of lariat intermediates requires the nuclear basket gene MLP1: the export of both the brG reporter (mostly lariat intermediate) and the UAc reporter (exclusively lariat intermediate) showed a dependence on MLP1 (Fig. 3). Further, by beta-galactosidase activity, we tested in total five separate reporters – three that accumulated lariat intermediate and two that accumulated exclusively pre-mRNA; only the three reporters accumulating lariat intermediate showed a dependence of export on MLP1 (Fig. 4B,D; Fig S2D); the reporters accumulating pre-mRNA did not show a dependence on MLP1 (Fig. S2E), further validating our main conclusion. We are considering additional experiments to validate this key conclusion even further. Also, see response to comment 1 from reviewer 1.

      We agree that the main conclusion from this manuscript differs from earlier studies. A key difference is that prior studies monitored exclusively pre-mRNA. In our study, we monitored pre-mRNA and lariat intermediate species and in doing so revealed a role for MLP1 in the export of lariat intermediates. This study, our previous study, as well as the previous studies of others have all provided evidence for efficient export of pre-mRNA; all of these studies are in conflict with the studies purporting a general role for the nuclear basked in retaining immature mRNA. Still, these past apparently conflicting studies can be re-interpreted in the context of our model that the export of such species requires docking at the nuclear basket, followed by undocking. In a revised manuscript, we will discuss the possibility that pre-mRNA apparently “retained” by the nuclear basket are stalled in export at the undocking stage.

      Reviewer #3 (Public Review):

      Summary:

      Zeng and Stanley show that in yeast, intron-lariat intermediates that accumulated due to defects in pre-mRNA splicing, are transported to the cytoplasm using the canonical mRNA export pathway. Moreover, they demonstrate that export requires the nuclear basket, a sub-structure of the nuclear pore complex previously implicated with the retention of immature mRNAs. These observations are important as they put into question a longstanding model that the main role of the nuclear basket is to ensure nuclear retention of immature or faulty mRNAs.

      Strengths:

      The authors elegantly combine genetic, biochemical, and single-molecule resolution microscopy approaches to identify the cellular pathway that mediates the cytoplasmic accumulation of lariat intermediates. Cytoplasmic accumulation of such splicing intermediates had been observed in various previous studies but how these RNAs reach the cytoplasm had not yet been investigated. By using smFISH, the authors present compelling, and, for the first time, direct evidence that these intermediates accumulate in the cytoplasm and that this requires the canonical mRNA export pathway, including the RNA export receptor Mex67 as well as various RNA-binding proteins including Yra1, Npl3 and Nab2. Moreover, they show that the export of lariat intermediates, but not mRNAs, requires the nuclear basket (Mlp1) and basket-associated proteins previously linked to the mRNP rearrangements at the nuclear pore. This is a surprising and important observation with respect to a possible function of the nuclear basket in mRNA export and quality control, as it challenges a longstanding model that the role of the basket in mRNA export is primarily to act as a gatekeeper to ensure that immature mRNAs are not exported. As discussed by the authors, their finding suggests a role for the basket in promoting the export of certain types of RNAs rather than retention, a model also supported by more recent studies in mammalian cells. Moreover, their findings also collaborate with a recent paper showing that in yeast, not all nuclear pores contain a basket (PMID: 36220102), an observation that also questioned the gatekeeper model of the basket, as it is difficult to imagine how the basket can serve as a gatekeeper if not all nuclear pore contain such a structure.

      We thank the reviewer for highlighting the importance and surprising nature of our findings.

      Weaknesses:

      One weakness of this study is that all their experiments rely on using synthetic splicing reporter containing a lacZ gene that produces a relatively long transcript compared to the average yeast mRNA.

      We are considering repeating some of our experiments to monitor export of RNAs with more average lengths.

      The rationale for using a reporter containing the brG (G branch point) resulting in more stable lariat intermediates due to them being inefficient substrates for the debranching enzyme Dbr1 could be described earlier in the manuscript, as this otherwise only becomes clear towards the end, what is confusing.

      We thank the reviewer for this comment. We will revise the text to explain sooner the rationale for using the brG reporter to assess the export of lariat intermediates.

      Discussion of their observation in the context that, in yeast, not all pores contain a basket would be useful.

      Thanks for this suggestion. We will raise this point that a nuclear basket is not present on all nuclear pores and discuss the implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes the mechanism of protein disaggregation by the ClpL AAA+ protein of Listeria monocytogenes. Using several model subtrate proteins the authors first show that ClpL possesses a robust disaggregase activity that does not further require the endogenous DnaK chaperone in vitro. In addition, they found that ClpL is more thermostable than the endogenous L. monocytogenes DnaK and has the capacity to unfold tightly folded protein domains. The mechanistic basis for the robust disaggregase activity of ClpL was also dissected in vitro and in some cases, supported by in vivo data performed in chaperonedeficient E. coli strains. The data presented show that the two AAA domains, the pore-2 site and the N-terminal domain (NTD) of ClpL are critical for its disaggregase activity. Remarkably, grafting the NTD of ClpL to ClpB converted ClpB into an autonomous disaggregase, highlighting the importance of such a domain in the DnaK-independent disaggregation of proteins. The role of the ClpL NTD domain was further dissected, identifying key residues and positions necessary for aggregate recognition and disaggregation. Finally, using sets of SEC and negative staining EM experiments combined with conditional covalent linkages and disaggregation assays the authors found that ClpL shows significant structural plasticity, forming dynamic hexameric and heptameric active single rings that can further form higher assembly states via their middle domains.

      Strengths:

      The manuscript is well-written and the experimental work is well executed. It contains a robust and complete set of in vitro data that push further our knowledge of such important disaggregases. It shows the importance of the atypical ClpL N-terminal domain in the disaggregation process as well as the structural malleability of such AAA+ proteins. More generally, this work expands our knowledge of heat resistance in bacterial pathogens.

      Weaknesses:

      There is no specific weakness in this work, although it would have helped to have a drawing model showing how ClpL performs protein disaggregation based on their new findings. The function of the higher assembly states of ClpL remains unresolved and will need further extensive research. Similarly, it will be interesting in the future to see whether the sole function of the plasmid-encoded ClpL is to cope with general protein aggregates under heat stress.

      We thank the reviewer for the positive evaluation. We agree with the reviewer that it will be important to test whether ClpL can bind to and process non-aggregated protein substrates. Our preliminary analysis suggests that the disaggregation activity of ClpL is most relevant in vivo, pointing to protein aggregates as main target.

      We also agree that the role of dimers or tetramers of ClpL rings needs to be further explored. Our initial analysis suggests a function of ring dimers as a resting state. It will now be important to study the dynamics of ClpL assembly formation and test whether substrate presence shifts ClpL assemblies towards an active, single ring state.

      Reviewer #2 (Public Review):

      The manuscript by Bohl et al. is an interesting and carefully done study on the biochemical properties and mode of action of potent autonomous AAA+ disaggregase ClpL from Listeria monocytogenes. ClpL is encoded on plasmids. It shows high thermal stability and provides Listeria monocytogenes food-pathogen substantial increase in resistance to heat. The authors show that ClpL interacts with aggregated proteins through the aromatic residues present in its N-terminal domain and subsequently unfolds proteins from aggregates translocating polypeptide chains through the central pore in its oligomeric ring structure. The structure of ClpL oligomers was also investigated in the manuscript. The results suggest that mono-ring structure and not dimer or trimer of rings, observed in addition to mono-ring structures under EM, is an active species of disaggregase.

      Presented experiments are conclusive and well-controlled. Several mutants were created to analyze the importance of a particular ClpL domain.

      The study's strength lies in the direct comparison of ClpL biochemical properties with autonomous ClpG disaggregase present in selected Gram-negative bacteria and well-studied E. coli system consisting of ClpB disaggregase and DnaK and its cochaperones. This puts the obtained results in a broader context.

      We thank the reviewer for the detailed comments. There are no specific weaknesses indicated in the public review.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript details the characterization of ClpL from L. monocytogenes as a potent and autonomous AAA+ disaggregase. The authors demonstrate that ClpL has potent and DnaKindependent disaggregase activity towards a variety of aggregated model substrates and that this disaggregase activity appears to be greater than that observed with the canonical DnaK/ClpB co-chaperone. Furthermore, Lm ClpL appears to have greater thermostability as compared to Lm DnaK, suggesting that ClpL-expressing cells may be able to withstand more severe heat stress conditions. Interestingly, Lm ClpP can provide thermotolerance to E. coli that have been genetically depleted of either ClpB or in cells expressing a mutant DnaK103. The authors further characterized the mechanisms by which ClpL interacts with protein aggregates, identifying that the N-terminal domain of ClpL is essential for disaggregase function. Lastly, by EM and mutagenesis analysis, the authors report that ClpL can exist in a variety of larger macromolecular complexes, including dimer or trimers of hexamers/heptamers, and they provide evidence that the N-terminal domains of ClpL prevent dimer ring formation, thus promoting an active and substrate-binding ClpL complex. Throughout this manuscript the authors compare Lm ClpL to ClpG, another potent and autonomous disaggregase found in gram-negative bacteria that have been reported on previously, demonstrating that these two enzymes share homologous activity and qualities. Taken together this report clearly establishes ClpL as a novel and autonomous disaggregase.

      Strengths:

      The work presented in this report amounts to a significant body of novel and significant work that will be of interest to the protein chaperone community. Furthermore, by providing examples of how ClpL can provide in vivo thermotolerance to both E. coli and L. gasseri the authors have expanded the significance of this work and provided novel insight into potential mechanisms responsible for thermotolerance in food-borne pathogens.

      Weaknesses:

      The figures are clearly depicted and easy to understand, though some of the axis labeling is a bit misleading or confusing and may warrant revision. While I do feel that the results and discussion as presented support the authors' hypothesis and overall goal of demonstrating ClpL as a novel disaggregase, interpretation of the data is hindered as no statistical tests are provided throughout the manuscript. Because of this only qualitative analysis can be made, and as such many of the concluding statements involving pairwise comparisons need to be revisited or quantitative data with stats needs to be provided. The addition of statistical analysis is critical and should not be difficult, nor do I anticipate that it will change the conclusions of this report.

      We thank the reviewer for the valid criticism. We addressed the major concern of the reviewer and added the requested statistical analysis to all relevant figures. The analysis confirms our conclusions. We also followed the advice of the reviewer and revised axis labeling to increase clarity.

      Reviewer #1 (Recommendations For The Authors):

      • It would really help to have a model showing how ClpL performs protein disaggregation based on their findings.

      We show that ClpL exerts a threading activity that is fueled by ATP hydrolysis in both AAA domains and executed by pore-located aromatic residues. The basic disaggregation mechanism of ClpL therefore does not differ from ClpB and ClpG disaggregases. Similarly, the specificity of ClpL towards protein aggregates is based on simultaneous interactions of multiple N-terminal domains with the aggregate surface. We could recently describe a similar mode of aggregate recognition for ClpG [1]. We therefore prefer not to add a model to the manuscript. We are currently in preparation of a review that includes the characterization of the novel bacterial disaggregases and will present models there as we consider a review article as more appropriate for such illustrations.

      • AAA2 domain of ClpL in Fig 3E should be the same color as in Fig 1A.

      We used light grey instead of dark grey for the ClpL AAA2 domain in Fig 3E, to distinguish between ClpL and ClpB AAA domains. This kind of illustration allows for clearer separation of both AAA+ proteins and the fusion construct LN-ClpB*. We therefore prefer keeping the color code.

      • Partial suppression of the dnaK mutant could be added in the main manuscript Figure.

      The main figure 3 is already very dense and we therefore prefer showing respective data as part of a supplementary figure.

      • It would have been interesting to know if the robust autonomous disaggregation activity of ClpL would be sufficient to rescue the growth of more severe E. coli chaperone mutants, like dnaK tig for example. Did the authors test this?

      We tested whether expression of clpL can rescue growth of E. coli dnaK103 mutant cells at 40°C on LB plates. This experiment is different from the restoration of heat resistance in dnaK103 cells (Figure 3, figure supplement 2A), as continuous growth at elevated temperatures (40°C) is monitored instead of cell survival upon abrupt severe heat shock (49°C). We did not observe rescue of the temperature-sensitive growth phenotype (40°C) of dnaK103 cells upon clpL expression, though expression of clpG complemented the temperature-sensitive growth phenotype (see Author response image 1 below). This finding points to differences in chaperone activities of ClpL and ClpG. It also suggests that ClpL activity is largely restricted to heat-shock generated protein aggregates, enabling ClpL to complement the missing disaggregation function of DnaK but not other Hsp70 activities including folding and targeting of newly synthesized proteins. We believe that dissecting the molecular reasons for differences in ClpG and ClpL complementation activities should be part of an independent study and prefer showing the growth-complementation data only in the response letter.

      Author response image 1.

      Serial dilutions (10-1 – 10-6) of E. coli dnaK103 mutant cells expressing E. coli dnaK, L. monocytogenes clpL or P. aeruginosa clpG were spotted on LB plates including the indicated IPTG concentrations. Plates were incubated at 30°C or 40°C for 24 h. p: empty vector control.

      Reviewer #2 (Recommendations For The Authors):

      Based on results presented in Fig. 2B the authors conclude "that stand-alone disaggregases ClpL and ClpG but not the canonical KJE/ClpB disaggregase exhibit robust threading activities that allow for unfolding of tightly folded domains" (page 5 line 209). In this experiment, the threading power of disaggregases was assessed by monitoring YFP fluorescence during the disaggregation of aggregates formed by fusion luciferase-YFP protein. In my opinion, the results of the experiment depend not only on the threading power of disaggregases but also on the substrate recognition by analyzed disaggregating systems and/or processivity of disaggregases. N-terminal domain in the case of ClpL and KJE chaperones in the case of the KJE/ClpB system are involved in recognition. This is not discussed in the manuscript and the obtained result might be misinterpreted. The authors have created the LN-ClpB* construct (N-terminal domain of ClpL fused to derepressed ClpB) (Fig. 3 E and F). In my opinion, this construct should be used as an additional control in the experiment in Fig. 2 B. It possesses the same substrate recognition domain and therefore the direct comparison of disaggregases threading power might be possible.

      We performed the requested experiment (new Figure 3 - figure supplement 2D). We did not observe unfolding of YFP by LN-ClpB. Sínce ClpL and LN-ClpB do not differ in their aggregate targeting mechanisms, this finding underlines the differences in threading power between ClpL and activated (derepressed) ClpB. It also suggests that the AAA threading motors and the aggregate-targeting NTD largely function independently.

      Presented results suggest that tetramer and dimer of rings might be a "storage form" of disaggregase. It would be interesting to analyze the thermotolerance and/or phenotype of ClpL mutants that do not form tetramer and dimer (E352A). This variant possesses similar to WT disaggregation activity but does not form dimers and tetramers. If in vivo the differences are observed (for example toxicity of the mutant), the "storage form" hypothesis will be probable.

      When testing expression of clpL-MD mutants (E352A, F354A), which cannot form dimers and tetramers of ClpL rings, in E. coli ∆clpB cells, we observed reduced production levels as compared to ClpL wildtype and speculated that reduced expression might be linked to cellular toxicity. We therefore compared spotting efficiencies of E. coli ∆clpB cells expression clpL, ∆NclpL or the clpL-MD mutants at different temperatures. Expression of clpL at high levels abrogated colony formation at 42°C (new Figure 6 - figure supplement 3). ClpL toxicity was dependent on its NTD as no effect was observed upon expression of ∆N-clpL. ClpL-MD mutants (E352A, F354A) were expressed at much lower levels and exhibited strongly increased toxicity as compared to ClpL-WT when produced at comparable levels (new Figure 6 – figure supplement 3). This implies a protective role of ClpL ring dimers and tetramers in the cellular environment by downregulating ClpL activity. We envision that the formation of ClpL assemblies restricts accessibility of the ClpL NTDs and reduces substrate interaction. Increased toxicity of ClpL-E352A and ClpL-F354A points to a physiological relevance of the dimers and tetramers of ClpL rings and is in agreement with the proposed function as storage forms. We added this potential role of ClpL ring assemblies to the discussion section. Due to the strongly reduced production levels of ClpL MD mutants and their enhanced toxicity at elevated temperatures we did not test for their ability to restore thermotolerance in E. coli ∆clpB cells.

      Figure 6G and Figure 6 -figure supplement 2 - it is not clear what is the difference in the preparation of WT and WTox forms of ClpL.

      ClpL WT was purified under reduced conditions (+ 2 mM DTT), whereas WTox was purified in absence of DTT, thus serving as control for ClpL-T355C, which forms disulfide bonds upon purification without DTT. We have added respective information to the figure legend and the materials and methods section.

      Page 5 line 250 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2A should be Figure 3 - Figure Supplement 2A.

      Page 5 line 251 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2B/C should be Figure 3 - Figure Supplement 2B/C.

      Page 7 line 315 - wrong figure citation. Instead of Figure 4F, it should be Figure 4G Figure 1 - Figure Supplement 2E - At first glance, this Figure does not correspond to the text and is confusing. It would be nice to have bars for Lm ClpL activity in the figure. Alternatively, the description of the y-axis might be changed to "relative to Lm ClpL disaggregation activity" instead of "relative disaggregation activity". One has to carefully read the figure legend to find out that 1 corresponds to Lm ClpL activity.

      We have corrected all mistakes and changed the description of y-axis (Figure 1 - figure Supplement 2E) as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) While the authors make many experimental comparisons throughout their study, no statistical tests are described or presented with their results or figures, nor are these statistical tests described in the methods. While the data as presented does appear to support the author's conclusions, without these statistical tests no meaningful conclusions from paired analysis can be drawn. Critically, please report these statistical tests. As a general suggestion please include the statistics (p-values) in the results section when presenting this data, as well as in the figure legends, as this will allow the reader to better understand the authors' presentation and interpretation of the data.

      We have added statistical tests to all relevant figures. The analysis is confirming our former statements. We have further clarified our approach for the statistical analysis in the methods section. We report p-values in the results section, however, due to the volume of comparisons we did not add individual p-values to the figure legends but used standard labeling with stars.

      (2) Some of the axis labels for the presented graphs are a bit misleading or confusing. Many describe a relative (%) disaggregation rate, but it is not clear from the methods or figure legends what this rate is relative to. Is it relative to non-denatured substrates, to no chaperone conditions, etc.? Is it possible to present the figures with the raw data rates/activity (ex. luciferase activity / time) vs. relative rates? I think that labeling these figure axes with "disaggregation rate" is a bit misleading as none of these experiments measure the actual rate of disaggregation of these model substrates per se (say by SEC-MALS or other biophysical measurements), but instead infer the extent of disaggregation by measuring a property of these substrates, i.e. luciferase activity or fluorescence intensity over time. Thus, labeling these figures with the appropriate axis for what is being measured, and then clarifying in the methods and results what is being inferred by these measurements, will help solidify the author's conclusions.

      Relative (%) disaggregation rate usually refers to the disaggregation activity of ClpL wildtype serving as reference. We clarified this point in the revised text and respective figure legends. We now also refer to the process measured (e.g. relative refolding activity of aggregated Luciferase instead of relative disaggregation activity) as suggested by the reviewer and added clarifications to text and materials and methods.

      Since we have many measurements for our most frequently used assays and have a reasonable estimate for the general variance within these assays, we found it reasonable to show activity data in relation to fixed controls. This reduces the impact of unspecific variance and thereby makes more accurate comparisons between different repetitions. The reference is now indicated in the axis title.

      (3) The figures are well presented, clutter-free, and graphically easy to understand. Figure legends have sufficient information aside from the aforementioned statistical information and should include the exact number of independent replicates for each panel/experiment (ex. n=4), not just a greater than 3. While the figures do show each data point along with the mean and error, in some figures it is difficult to determine the number of replicate data points. Example figures 2c, 2d, and 3a. Also, please state whether the error is std. error or SEM.

      While we agree, that this is valuable information, we fear that overloading the figure legends with information may take a toll on the readability. We therefore decided to append the number of replicates for each experiment in a separate supplementary table (Table S2). The depicted error is showing the SD and not the SEM, which we also specified in the figure legends.

      (4) There are various examples throughout the results where qualitative descriptors are used to describe comparisons. Examples of this are "hardly enhanced" (Figure 1) and "partially reduced" (Figure 6). While this is not necessarily wrong, qualitative descriptions of comparisons in this manner would require further explanation. What is the definition of "hardly" or "partially"? My recommendation is to just state the data quantitatively, such as "% enhanced" or "reduced by x", this way there is no misinterpretation. Examples of this can be found in Figures 6C-G. This would require a full statistical overview and presentation of these stats in the results.

      We followed the reviewer`s advice and no longer use the terms criticized (e.g. “hardly enhanced”). We instead provide the requested quantifications in the text.

      Questions for Figures:

      Figures 1B and 1C:

      (1) Is the disaggregase activity of ClpL towards heat-denatured luciferase and GFP ATPdependent? While the authors later in the manuscript show that mutations within the Walker B domains dramatically impair reactivation (disaggregation) of denatured luciferase, this does not rule out an ATP-independent effect of these mutations. Thus, the authors should test whether disaggregase activity is observed when wild-type ClpL is incubated with denatured substrates without ATP present or in the presence of ADP only.

      We tested for ClpL disaggregation activity in absence of nucleotide and presence of ADP only (new Figure 1 – figure supplement 2A). We did not observe any activity, demonstrating that ClpL activity depends on ATP binding and hydrolysis (see also Figure 3 – figure supplement 1D: ATPase-deficient ClpL-E197A/E530A is lacking disaggregation activity).

      (2) The authors suggest that a reduction in disaggregase activity observed in samples combining Lm ClpL and KJE (Figure 1C, supp. 1C-E) could be due to competition for protein aggregate binding as observed previously with ClpG. Did the authors test this directly by pulldown assay or another interaction-based assay? While ClpL and ClpG appear to work in a similar manner, it would be good to confirm this. Also, clarification on how this competition operates would be useful. Is it that ClpL prevents aggregates from interacting with KJE, or vice versa?

      We probed for binding of ClpL to aggregated Malate Dehydrogenase in the presence of L. monocytogenes or E. coli Hsp70 (DnaK + respective J-domain protein DnaJ) by a centrifugation-based assay. Here, we used the ATPase-deficient ClpL-E197A/E530A (ClpLDWB) mutant, ensuring stable substrate interaction in presence of ATP. We observe reduced binding of ClpL-DWB to protein aggregates in presence of DnaK/DnaJ (new Figure 1 – figure supplement 2G). This finding indicates that both chaperones compete for binding to aggregated proteins and explains inhibition of ClpL disaggregation activity in presence of Hsp70.

      (3) Related to the above, while incubation of aggregated substrates with ClpL and KJE does appear to reduce aggregase activity towards GFP (Figure 1c), α-glucosidase (Supp. 1C), and MDH (Supp. 1D), this doesn't appear to be the case towards luciferase (Figure 1b, Supp. 1b). Furthermore, ClpL aggregase activity is reduced towards luciferase when combined with E. coli KJE (Supp. 1e) but not with Lm KJE (Figure 1b). The authors provide no commentary or explanation for these observations. Furthermore, these results complicate the concluding statement that "combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity ... ".

      We suggest that the differing inhibitory degrees of the KJE system on ClpL disaggregation activities reflect diverse binding affinities of KJE and ClpL to the respective aggregates. While we usually observe strong inhibition of ClpL activity in presence of KJE, this is different for aggregated Luciferase. This points to specific structural features of Luciferase aggregates or the presence of distinct binding sites on the aggregate surface that favour ClpL binding. We have added a respective comment to the revised manuscript.

      The former statement that “combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity” referred to aggregated GFP, MDH and α-Glucosidase for which a strong inhibition of ClpL activity was observed. We have specified this point.

      Figures 1D and 1E:

      (1) The authors conclude that the heat sensitivity of ΔClpL L. gasseri cells is because they do not express the canonical ClpB disaggregase. A good test to validate this would be to express KJE/ClpB in these Lg ΔClpL cells to see if heat-sensitivity could be fully or partially rescued.

      We agree that such experiment would further strengthen the in vivo function of ClpL as alternative disaggregase. However, such approach would demand for co-expression of E. coli ClpB with the authentic E. coli DnaK chaperone system (KJE), as ClpB and DnaK cooperate in a species-specific manner [2-4]. This makes the experiment challenging, also because the individual components need to be expressed at a correct stochiometry. Furthermore, the presence of the authentic L. gasseri KJE system, which is likely competing with the E. coli KJE system for aggregate binding, will hamper E. coli KJE/ClpB disaggregation activity in L. gasseri. In view of these limitations, we would like to refrain from conducting such an experiment.

      (2) The rationale for investigating Lg ClpL, and the aggregase activity assays are compelling and support the hypothesis that ClpL contributes to thermotolerance in multiple grampositive species. Though, from Figure 1d, why was only Lg ClpL investigated? It appears that S. thermophilus also lacks the canonical ClpB disaggregase and demonstrates ΔClpL heat sensitivity. There is also other Lactobacillus sp. presented that lack ClpB but were not tested for heat sensitivity. Why only test and move forward with L. gasseri? Lastly, L. mesenteroides is ClpB-negative but doesn't demonstrate ΔClpL heat sensitivity. Why?

      We wanted to document high, partner-independent disaggregation activity for another ClpL homolog. We chose L. gasseri, as (i) this bacterial species lacks a ClpB homolog and (ii) a ∆clpL mutant exhibit reduced survival upon severe heat shock (thermotolerance phenotype), which is associated with defects in cellular protein disaggregation. The characterization of L. gasseri ClpL as potent disaggregase in vitro represents a proof-of-concept and allows to generalize our conclusion. We therefore did not further test S. thermophilus ClpL. L. mesenteroides encodes for ClpL but not ClpB, yet, a ∆clpL mutant has not yet been characterized in this species to the best of our knowledge. As we wanted to link ClpL in vitro activity with an in vivo phenotype, we did not characterize L. mesenteroides ClpL.

      We agree with the reviewer that the characterization of additional ClpL homologs is meaningful and interesting, however, we strongly believe that such analysis should be part of an exhaustive and independent study.

      Figures 2A and 2B:

      (1) Figure 2B demonstrates that both ClpL and ClpG, but not the canonical KJE/ClpB, are able to unfold YFP during the luciferase disaggregation process, suggesting that ClpL and ClpG exhibit stronger threading activity. A technical question, can luciferase activity be measured alongside in the same assay sample? If so, would you expect to observe a concomitant increase in luciferase activity as YFP fluorescence decreases?

      KJE/ClpB can partially disaggregate and refold aggregated Luciferase-YFP without unfolding YFP during the disaggregation reaction [5]. YFP unfolding is therefore not linked to refolding of aggregated Luciferase-YFP. On the other hand, unfolding of YFP during disaggregation can hamper the refolding of the fused Luciferase moiety as observed for the AAA+ protein ClpC in presence of its partner MecA [5]. These diverse effects make the interpretation of LuciferaseYFP refolding experiments difficult as the degree of YFP unfolding activity does not necessarily correlate with the extend of Luciferase refolding. We therefore avoided to perform the suggested experiment.

      Figure 2C and 2D:

      (1) Thermal shift assays for ClpL, ClpG, and DnaK were completed with various nucleotides. Were these experiments also completed with samples in their nucleotide-free apo state? Also, while all these chaperones are ATPases, the nucleotides used differ, but no explanation is provided. Comparison should be made of these ATPases bound to the same molecules.

      We did not monitor thermal stabilities of chaperones without nucleotide as such state is likely not relevant in vivo. We used ATPγS in case of ClpL to keep the AAA+ protein in the ATPconformation. ATP would be rapidly converted to ADP due to the high intrinsic ATPase activity of ClpL. In case of DnaK ATPγS cannot be used as it does not induce the ATP conformation [6]. The low intrinsic ATPase activity of DnaK allows determining the thermal stability of its ATP conformation in presence of ATP. This is confirmed by calculating a reduced thermal stability of ADP-bound DnaK.

      (2) The authors suggest that incubation at 55⁰C will cause unfolding of Lm DnaK, but not ClpL, providing ClpL-positive Lm cells disaggregase activity at 55⁰C. While the thermal shift assays in Figures 2C and 2D support this, an experiment to test this would be to heat-treat Lm DnaK and ClpL at 55⁰C then test for disaggregase activity using either aggregated luciferase or GFP as in Figure 1.

      We followed the suggestion of the reviewer and incubated Lm ClpL and DnaK at 55-58°C in presence of ATP for 15 min prior to their use in disaggregation assays. We compared the activities of pre-heated chaperones with controls that were incubated at 30°C for 15 min. Notably, we did not observe a loss of DnaK disaggregation activity, suggesting that thermal unfolding of DnaK at this temperature is reversible. We provide these data as Figure 2 -figure supplement 1 and added a respective statement to the revised manuscript.

      Figure 3B:

      (1) The authors state that ATPase activity of ΔN-ClpL was "hardly affected", but from the data provided it appeared to result in an approximate 35% reduction. As discussed above, no stats are provided for this figure, but given the error bars, it is highly likely that this reduction is significant. Please perform this statistical test, and if significant, please reflect this in the written results as well as the figure. Lastly, if this reduction in ATPase activity is significant, why would this be so, and could this contribute to the reduction in aggregase activity towards luciferase and MDH observed in Figure 3A?

      We applied statistical tests as suggested by the reviewer, showing that the reduction in ATPase activity of ∆N-ClpL is statistically significant. N-terminal domains of Hsp100 proteins can modulate ATPase activity as shown for the family member ClpB, functioning as auxiliary regulatory element for fine tuning of ClpB activity [7]. We speculate that the impact of the ClpL-NTD on the assembly state (stabilization of ClpL ring dimers) might affect ClpL ATPase activity. We would like to point out that other ClpL mutants (e.g. NTD mutant ClpL-Y51A; MDmutant ClpL-F354A) have a similarly reduced ATPase activity, yet exhibit substantial disaggregation activity (approx. 2-fold reduced compared to ClpL wildtype). In contrast ∆NClpL does not exhibit any disaggregation activity. This suggests that the loss of disaggregation activity is caused by a substrate binding defect but not by a partial reduction in ATPase activity. We added a comment on the reduced ATPase activity and also discuss its potential reasons in the discussion section.

      (2) I think the authors' conclusion that deletion of the ClpL NTD does not contribute to structural defects of ClpL is premature given the apparent reduction in ATPase activity. Did the authors perform any biophysical analysis of ΔN-ClpL to confirm this conclusion? Thermal shift assays, Native-PAGE, or size-exclusion chromatography for aggregates would all be good assays to demonstrate that the wild-type and ΔN-ClpL have similar structural properties. Surprisingly, Figure 6 describes significant macromolecular changes associated with ΔN-ClpL such that it preferentially forms a dimer of rings. Furthermore, in Supp. Figure 6D the authors report that ΔN-ClpL appears to have an increased Tm as compared to WT- or ΔM-ClpL. The authors should reflect these observations as deletion of the ClpL NTD does appear to contribute to structural changes, though perhaps only at the macromolecular scale, i.e. dimerization of the rings.

      We have characterized the oligomeric state of ∆N-ClpL by size exclusion chromatography (Figure 6 – figure supplement 1A) and negative staining electron microscopy (Figure 6C), both showing that it forms assemblies similar to ClpL wildtype. We did not observe an increased tendency of ∆N-ClpL to form aggregates and the protein remained fully soluble after several cycles of thawing and freezing. EM data reveal that ∆N-ClpL exclusively form ring dimers, suggesting that the NTDs destabilize MD-MD interactions. The stabilized interaction between two ∆N-ClpL rings can explain the increased thermal stability (Figure 6 – figure supplement 1D). We speculate that the ClpL NTDs either affect MD-MD interactions through steric hindrance or by directly contacting MDs. We have added a respective statement to the discussion section.

      Figure 3C and 3D:

      (1) Given the larger error in samples expressing ClpG (100) or ClpL (100) statistical analysis with p-values is required to make conclusions regarding the comparison of these samples vs. plasmid-only control. The effect of ΔN-ClpL vs. wild-type ClpL looks compelling and does appear to attenuate the ClpL-induced thermotolerance. This is nicely demonstrated in Figure 3D.

      We quantified respective spot tests (new Figure 3E) and tested for statistical significance as suggested by the reviewer. We show that restoration of heat resistance is significant for the first 30 min. While we always observe rescue at later timepoints significance is lost here due to larger deviations in the number of viable cells and thus the degree of complementation.

      Figure 3F:

      (1) What is the role of the ClpB NTD? It appears to be dispensable for disaggregase activity, assuming that ClpB is co-incubated with KJE. A quick explanation of this domain in ClpB could be useful.

      The ClpB NTD is not required for disaggregation activity, as ClpB is recruited to protein aggregates by DnaK, which interacts with the ClpB MDs. Still, two functions have been described for the ClpB NTD. First, it can bind soluble unfolded substrates such as casein [8]. This substrate binding function can increase ClpB disaggregation activity towards some aggregated model substrates (e.g. Glucose-6-phosphate dehydrogenase) [9]. However, NTD deletion usually does not decrease ClpB disaggregation activity and can even lead to an increase [7, 10, 11]. An increased disaggregation activity of ∆N-ClpB correlates with an enhanced ATPase activity, which is explained by NTDs stabilizing a repressing conformation of the ClpB MDs, which function as main regulators of ClpB ATPase activity [7]. We added a short description on the role of the ClpB NTD to the respective results section.

      (2) The result of fusing the ClpL NTD to ClpB supports a role for this NTD in promoting autonomous disaggregase activity. What would you expect to observe if the fused Ln-ClpB protein was co-incubated with KJE? Would this further promote disaggregase activity, or potentially impair through competition? This experiment could potentially support the authors' hypothesis that ClpL and ClpB/KJE can compete with each other for aggregated substrates as suggested in Figure 1.

      We have performed the suggested experiment using aggregated MDH as model substrate. We did not observe an inhibition of LN-ClpB disaggregation activity in presence of KJE. In contrast ClpL disaggregation activity towards aggregated MDH is inhibited upon addition of KJE due to competition for aggregate binding (Figure 1 – figure supplement 2D/F). Disaggregation activity of LN-ClpB in presence of KJE can be explained by functional cooperation between both chaperone systems, which involves interactions between aggregate-bound DnaK and the ClpB MDs of the LN-ClpB fusion construct. We prefer showing these data only in the response letter but not including them in the manuscript, as respective results distract from the main message of the LN-ClpB fusion construct: the ClpL NTD functions as autonomous aggregatetargeting unit that can be transferred to other Hsp100 family members.

      Author response image 2.

      LN-ClpB cooperates with DnaK in protein disaggregation. Relative MDH disaggregation activities of indicated disaggregation systems were determined. KJE: DnaK/DnaJ/GrpE. The disaggregation activity of Lm ClpL was set to 1. Statistical Analysis: Oneway ANOVA, Welch’s Test for post-hoc multiple comparisons. Significance levels: **p < 0.001. n.s.: not significant.

      Figures 4E and 4F:

      (1) While the effect of various NTD mutations follows a similar trend in regard to the impairment of ClpL-mediated disaggregation of luciferase and MDH, the degree of these effects does appear different. For example, patch A and C mutations reduce ClpL disaggregase activity towards luciferase (~60% / 50% reduction) vs. MDH (>90%) respectively. While these results do suggest a critical role for residues in patches A and C of ClpL, these substrate-specific differences are not discussed. Why would we expect a difference in the effect of these patch A/C ClpL mutations on different substrates?

      We speculate that the aggregate structure and the presence or distributions of ClpL NTD binding sites differ between aggregated Luciferase and MDH. A difference between both aggregated model substrates was also observed when testing for an inhibitory effect of Lm KJE (and Ec KJE) on ClpL disaggregation activity (see comment above). We speculate that the mutated NTD residues make specific contributions to aggregate recognition. The severity of binding defects (and reduction of disaggregation activities) of these mutants will depend on specific features of the aggregated model substrates. We now point out that ClpL NTD patch mutants can differ in disaggregation activities depending on the aggregated model substrate used and refer to potential differences in aggregate structures.

      (2) The authors suggest that the loss of disaggregation activity of selected NTD mutants could be linked to reduced binding to aggregated luciferase. While this is likely given that these mutations do not appear to affect ATPase activity (Supp. 4), it could be possible that these mutants can still bind to aggregated luciferase and some other mechanism may impair disaggregation. A pull-down assay would help to prove whether reduced binding is observed in these NTD ClpL mutants. This also needs to be confirmed for Supp. Figure 4.2H.

      We have shown a strong correlation between loss of aggregate binding and disaggregation activity for several NTD mutants (Fig. 4G, Figure 4 – figure supplement 2H). We decided to perform the aggregate binding assay only with mutants that show a full but not a partial disaggregation defect as we made the experience that the centrifugation-based assay provides clear and reproducible results for loss-of-activity mutants but has limitations in revealing differences for partially affected mutants. This might be explained by the use of nonhydrolyzable ATPγS in these experiments, which strongly stabilizes substrate interactions, potentially covering partial binding defects. We agree with the reviewer that some ClpL NTD mutants might have additional effects on disaggregation activity by e.g. controlling substrate transfer to the processing pore site. We have added a respective comment to the revised manuscript.

      (3) Supp. Figure 4.2H has no description in the figure legend. The Y-axes states % aggregate bound to chaperone. How was this measured? See the above comments for Figures 4E and 4F.

      We apologize and added the description to the figure legend. The determination of % aggregate bound chaperone is based on the quantifications of chaperones present in the supernatant and pellet fractions after sample centrifugation. Background levels of chaperones in the pellet fractions in absence of protein aggregates were subtracted. We added this information to the materials and methods section.

      Figure 6G:

      The authors observed reduced disaggregase activity and ATPase activity of mutant T355C under both oxidative and reducing conditions. While this observation under oxidative conditions supports the authors' hypothesis, under reducing conditions (+DTT) we would expect the enzyme to behave similarly to wild-type ClpL unless this mutation has other effects. Can the authors please comment on this and provide an explanation or hypothesis?

      The reviewer is correct, ClpL-T355C exhibit a reduced disaggregation activity (Figure 6 – figure supplement 2B). We observe a similar reduction in disaggregation activity for the ClpL MD mutant F354A, pointing to an auxiliary function of the MD in protein disaggregation. We have made a respective comment in the discussion section of the revised manuscript. How exactly ClpL MDs support protein disaggregation is currently unclear and will be subject of future analysis in the lab. We strongly believe that such analysis should be part of an independent study.

      Discussion:

      In the fourth feature, it is discussed that one disaggregase feature of ClpL is that it does not cooperate with the ClpP protease. While a reference is provided for the canonical ClpB, no data in this paper, nor a reference, is provided demonstrating that ClpL does not interact with ClpP. As discussed, it is highly unlikely that ClpL interacts with ClpP given that ClpL does not contain the IGL/F loops that mediate the interaction of ClpP with cochaperones, such as ClpX, but data or a reference is needed to make such a factual statement.

      The absence of the IGL/F loop makes an interaction between ClpL and ClpP highly unlikely. However, the reviewer is correct, direct evidence for a ClpP-independent function of ClpL, though very likely, is not provided. We have therefore rephrased the respective statement: “Forth, novel disaggregases lack the specific IGL/F signature motif, which is essential for cooperation of other Hsp100 proteins with the peptidase ClpP. This feature is shared with the canonical ClpB disaggregase [12] suggesting that protein disaggregation is primarily linked to protein refolding.”.

      References

      (1) Katikaridis P, Simon B, Jenne T, Moon S, Lee C, Hennig J, et al. Structural basis of aggregate binding by the AAA+ disaggregase ClpG. J Biol Chem. 2023:105336.

      (2) Glover JR, Lindquist S. Hsp104, Hsp70, and Hsp40: A novel chaperone system that rescues previously aggregated proteins. Cell. 1998;94:73-82.

      (3) Krzewska J, Langer T, Liberek K. Mitochondrial Hsp78, a member of the Clp/Hsp100 family in Saccharomyces cerevisiae, cooperates with Hsp70 in protein refolding. FEBS Lett. 2001;489:92-6.

      (4) Seyffer F, Kummer E, Oguchi Y, Winkler J, Kumar M, Zahn R, et al. Hsp70 proteins bind Hsp100 regulatory M domains to activate AAA+ disaggregase at aggregate surfaces. Nat Struct Mol Biol. 2012;19:1347-55.

      (5) Haslberger T, Zdanowicz A, Brand I, Kirstein J, Turgay K, Mogk A, et al. Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments. Nat Struct Mol Biol. 2008;15:641-50.

      (6) Theyssen H, Schuster H-P, Bukau B, Reinstein J. The second step of ATP binding to DnaK induces peptide release. J Mol Biol. 1996;263:657-70.

      (7) Iljina M, Mazal H, Goloubinoff P, Riven I, Haran G. Entropic Inhibition: How the Activity of a AAA+ Machine Is Modulated by Its Substrate-Binding Domain. ACS chemical biology. 2021;16:775-85.

      (8) Rosenzweig R, Farber P, Velyvis A, Rennella E, Latham MP, Kay LE. ClpB N-terminal domain plays a regulatory role in protein disaggregation. Proc Natl Acad Sci U S A. 2015;112:E6872-81.

      (9) Barnett ME, Nagy M, Kedzierska S, Zolkiewski M. The amino-terminal domain of ClpB supports binding to strongly aggregated proteins. J Biol Chem. 2005;280:34940-5.

      (10) Beinker P, Schlee S, Groemping Y, Seidel R, Reinstein J. The N Terminus of ClpB from Thermus thermophilus Is Not Essential for the Chaperone Activity. J Biol Chem. 2002;277:47160-6.

      (11) Mogk A, Schlieker C, Strub C, Rist W, Weibezahn J, Bukau B. Roles of individual domains and conserved motifs of the AAA+ chaperone ClpB in oligomerization, ATP-hydrolysis and chaperone activity. J Biol Chem. 2003;278:15-24.

      (11) Weibezahn J, Tessarz P, Schlieker C, Zahn R, Maglica Z, Lee S, et al. Thermotolerance Requires Refolding of Aggregated Proteins by Substrate Translocation through the Central Pore of ClpB. Cell. 2004;119:653-65.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors identified that genetically and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.<br /> Strengths:

      The study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia. Overall, the article it's well written and clear.<br /> Weaknesses:

      Many of the experiments confirmed previous published data, which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line. The mechanistic insights of how the increased amount of long ceramides (cer c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed.

      We thank the reviewer for the assessment and would like to point out that Cers1 had not previously been studied in the context of aging. Moreover, our unbiased pathway analyses in human skeletal muscle implicate CERS1 for the first time with myogenic differentiation, which we validate in cell culture systems. To improve mechanistic insights, as suggested by Reviewer #1, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. Hence, we believe that reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition, sphingosine is forced towards the production of other, potentially less toxic or myogenesis-impairing ceramides. We added these new data to the revised manuscript as new Fig 5D-E and new Fig S5G-I.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wohlwend et al. investigates the implications of inhibiting ceramide synthase Cers1 on skeletal muscle function during aging. The authors propose a role for Cers1 in muscle myogenesis and aging sarcopenia. Both pharmacological and AAV-driven genetic inhibition of Cers1 in 18month-old mice lead to reduced C18 ceramides in skeletal muscle, exacerbating age-dependent features such as muscle atrophy, fibrosis, and center-nucleated fibers. Similarly, inhibition of the Cers1 orthologue in C. elegans reduces motility and causes alterations in muscle morphology.<br /> Strengths:

      The study is well-designed, carefully executed, and provides highly informative and novel findings that are relevant to the field.

      Weaknesses:

      The following points should be addressed to support the conclusions of the manuscript.

      (1) It would be essential to investigate whether P053 treatment of young mice induces age-dependent features besides muscle loss, such as muscle fibrosis or regeneration. This would help determine whether the exacerbation of age-dependent features solely depends on Cers1 inhibition or is associated with other factors related to age- dependent decline in cell function. Additionally, considering the reported role of Cers1 in whole-body adiposity, it is necessary to present data on mice body weight and fat mass in P053treated aged-mice.

      We thank the reviewer to suggest that we study Cers1 inhibition in young mice. In fact, a previous study shows that muscle-specific Cers1 knockout in young mice impairs muscle function (PMID: 31692231). Similar to our observation, these authors report reduced muscle fiber size and muscle force. Therefore, we do not believe that our observed effects of Cers1 inhibition in aged mice are specific to aging, although the phenotypic consequences are accentuated in aged mice. As requested by the reviewer, we attached the mice body weights and fat mass (Author response image 1A-B). The reduced fat mass upon P053 treatment is in line with previously reported reductions in fat mass in chow diet or high fat diet fed young mice upon Cers1 inhibition (PMID: 30605666, PMID: 30131496), again suggesting that the effect of Cers1 inhibition might not be specific to aging.

      Author response image 1.

      (A-B) Body mass (A) and Fat mass as % of body mass (B) were measured in 22mo C57BL/6J mice intraperitoneally injected with DMSO or P053 using EchoMRI (n=7-12 per group). (C-D) Grip strengh measurements in all limbs (C) or only the forelimbs (D) in 24mo C57BL/6J mice intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (n=8 per group). (E-F) Pax7 gene expression in P053 or AAV9 treated mice (n=6-7 per group) (E), or in mouse C2C12 muscle progenitor cells treated with 25nM scramble or Cers1 targeting shRNA (n=8 per group) (F). (G) Proliferation as measured by luciferase intensity in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=24 per group). Each column represents one biological replicate. (H) Overlayed FACS traces of Annexin-V (BB515, left) and Propidium Iodide (Cy5, right) of mouse C2C12 muscle myotubes treated with 25nM scramble or Cers1 targeting shRNA (n=3 per group). Quantification right: early apoptosis (Annexin+-PI-), late apoptosis (Annexin+-PI+), necrosis (Annexin--PI+), viability (Annexin--PI-). (I) Normalized Cers2 gene expression in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=6-7 per group). (J-K) Representative mitochondrial respiration traces of digitonin-permeablized mouse C2C12 muscle muscle cells treated DMSO or P053 (J) with quantification of basal, ATP-linked, proton leak respiration as well as spare capacity and maximal capacity linked respiration (n=4 per group). (L) Reactive oxygen production in mitochondria of mouse C2C12 muscle muscle cells treated DMSO or P053. (M) Enriched gene sets related to autophagy and mitophagy in 24mo C57BL/6J mouse muscles intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (left), or intraperitoneally injected with DMSO or P053 (right). Color gradient indicates normalized effect size. Dot size indicates statistical significance (n=6-8 per group). (N) Representative confocal Proteostat® stainings with quantifications of DMSO and P053 treated mouse muscle cells expressing APPSWE (top) and human primary myoblasts isolated from patients with inclusion body myositis (bottom). (O) Stillness duration during a 90 seconds interval in adult day 5 C. elegans treated with DMSO or 100uM P053. (P) Lifespan of C. elegans treated with DMSO or P053. (n=144-147 per group, for method details see main manuscript page 10).

      (2) As grip and exercise performance tests evaluate muscle function across several muscles, it is not evident how intramuscular AAV-mediated Cers1 inhibition solely in the gastrocnemius muscle can have a systemic effect or impact different muscles. This point requires clarification.

      The grip strength measurements presented in the manuscript come from hindlimb grip strength, as pointed out in the Methods section. We measured grip strength in all four limbs, as well as only fore- (Author response image 1C-D). While forelimb strength did not change, only hindlimb grip strength was significantly different in AAV-Cers1KD compared to the scramble control AAV (Fig 3I), which is in line with the fact that we only injected the AAV in the hindlimbs. This is similar to the effect we observed with our previous data where we saw altered muscle function upon IM AAV delivery in the gastrocnemius (PMID: PMID: 34878822, PMID: 37118545). The gastrocnemius likely has the largest contribution to hindlimb grip strength given its size, and possibly even overall grip strength as suggested by a trend of reduced grip strength in all four limbs (Author response image 1C). We also suspect that the hindlimb muscles have the largest contribution to uphill running as we could also see an effect on running performance. While we carefully injected a minimal amount of AAV into gastrocnemius to avoid leakage, we cannot completely rule out that some AAV might have spread to other muscles. We added this information to the discussion of the manuscript as a potential limitation of the study.

      (3) To further substantiate the role of Cers1 in myogenesis, it would be crucial to investigate the consequences of Cers1 inhibition under conditions of muscle damage, such as cardiotoxin treatment or eccentric exercise.<br /> While it would be interesting to study Cers1 in the context of muscle regeneration, and possibly mouse models of muscular dystrophy, we think such work would go beyond the scope of the current manuscript.

      (4) It would be informative to determine whether the muscle defects are primarily dependent on the reduction of C18-ceramides or the compensatory increase of C24-ceramides or C24-dihydroceramides.

      To improve mechanistic insights, as suggested by Reviewer #2, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. We added these data to the manuscript as new Fig 5D-E, new Fig S5G-I. These data, together with our previous results showing that Degs1 knockout reduces myogenesis (PMID: 37118545, Fig. 6s-x and Fig. 7) suggest that C24/dhC24 might contribute to the age-related impairments in myogenesis. We added the new results to the revised manuscript.

      (5) Previous studies from the research group (PMID 37118545) have shown that inhibiting the de novo sphingolipid pathway by blocking SPLC1-3 with myriocin counteracts muscle loss and that C18-ceramides increase during aging. In light of the current findings, certain issues need clarification and discussion. For instance, how would myriocin treatment, which reduces Cers1 activity because of the upstream inhibition of the pathway, have a positive effect on muscle? Additionally, it is essential to explain the association between the reduction of Cers1 gene expression with aging (Fig. 1B) and the age-dependent increase in C18-ceramides (PMID 37118545).

      Blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore seems beneficial for muscle aging. While most enzymes in the ceramide pathway that we studied so far (SPTLC1, CERS2) revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects. This is also visible in the direction of CERS1 expression compared to the other enzymes in one of our previous published studies (PMID: 37118545, Fig. 1e and Fig. 1f). In the current study, we show that Cers1 inhibition indeed exacerbates age-related myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. As the reviewer points out, both C18- and C24-ceramides seem to accumulate upon muscle aging. We think this is due to an overall overactive ceramide biosynthesis pathway. Blocking C18-ceramides via Cers1 inhibition results in the accumulates C24-ceramides and worsens muscle phenotypes (see reply to question #4). On the other hand, blocking C24-ceramides via Cers2 inhibition improves muscle differentiation. These observations together with the finding that Cers1 mediated inhibition of muscle differentiation is dependent on proper Cers2 function (new Fig 5D-E, new Fig S5G-I) points towards C24-ceramides as the main culprit of reduced muscle differentiation. Hence, at least a significant part of the benefits of blocking SPTLC1 might have been related to reducing very long-chain ceramides. We believe that reduced Cers1 expression in skeletal muscle upon aging, observed by us and others (PMID: 31692231), might reflect a compensatory mechanism to make up for an overall overactive ceramide flux in aged muscles. Reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition (new Fig 5E-D, new Fig S5G-I), sphingosine is forced towards the production of other, potentially less toxic, or myogenesis-impairing ceramides. These data are now added to the revised manuscript (see page 7). Details were added to the discussion of the manuscript (see page 8).

      Addressing these points will strengthen the manuscript's conclusions and provide a more comprehensive understanding of the role of Cers1 in skeletal muscle function during aging.

      Reviewer #1 (Recommendations For The Authors):

      The authors identified that genetical and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.

      Even though many of the experiments only confirmed previous published data (ref 21, 11,37,38), which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line, the study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia and opens new questions on understanding how inhibition of SPTLC1 (upstream CERS1) have beneficial effects in healthy aging (ref 15 published by the same authors).

      Overall, the article it's well written and clear. However, there is a major weakness. The mechanistic insights of how the increased amount of long ceramides (c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed. At the present stage the manuscript is descriptive and confirmatory of CERS1 mediated function in preserving muscle mass. The authors should consider the following points:

      Comments:

      (1) Muscle data

      (a) The effect of CERS1 inhibition on myotube formation must be better characterized. Which step of myogenesis is affected? Is stem cell renewal or MyoD replication/differentiation, or myoblast fusion or an increased cell death the major culprit of the small myotubes? Minor point: Figure S1C: show C14:00 level at 200 h; text of Fig S2A and 1F: MRF4 and Myogenin are not an early gene in myogenesis please correct, Fig S2B and 2C: changes in transcript does not mean changes in protein or myotube differentiation and therefore, authors must test myotube formation and myosin expression.

      Cers1 inhibition seems to affect differentiation and myoblast fusion. To test other suggested effects we performed more experiments as delineated. Inhibiting Cers1 systemically with the pharmacological inhibitor of Cers1 (P053) or with intramuscular delivery of AAV expressing a short hairpin RNA (shRNA) against Cers1 in mice did not affect Pax7 transcript levels (Author response image 1E). Moreover, we did also not observe an effect of shRNA targeting Cers1 on Pax7 levels in mouse C2C12 muscle progenitor cells (Author response image 1F). To characterize the effect of Cers1 inhibition on muscle progenitor proliferation/renewal, we used scramble shRNA, or shRNA targeting Cers1 in C2C12 muscle progenitors and measured proliferation using CellTiter-Glo (Promega). Results showed that Cers1KD had no significant effect on cell proliferation (Author response image 1G). Next, we assayed cell death in differentiating C2C12 myotubes deficient in Cers1 using FACS Analysis of Annexin V (left) and propidium iodide (right). We found no difference in early apoptosis, late apoptosis, necrosis, or muscle cell viability, suggesting that cell death can be ruled out to explain smaller myotubes (Author response image 1H). These findings support the notion that the inhibitory effect of Cers1 knockdown on muscle maturation are primarily based on effects on myogenesis rather than on apoptosis. Our data in the manuscript also suggests that Cers1 inhibition affects myoblast fusion, as shown by reduced myonucleation upon Cers1KD (Fig S3H right, Fig S5I).

      (b) The phenotype of CESR1 knockdown is milder than 0P53 treated mice (Fig S5D and Figure 3F, 3H are not significant) despite similar changes of Cer18:0, Cer24:0, Cer 24:1 concentration in muscles . Why?

      Increases in very long chain ceramides were in fact larger upon P053 administration compared to AAVmediated knockdown. For example, Cer24:0 levels increased by >50% upon P053 administration, compared to 20% by AAV injections. Moreover, dhC24:1 increased by 6.5-fold vs 2.5-fold upon P053 vs AAV treatment, respectively. These differences might not only explain the slightly attenuated phenotypes in the AA- treated mice but also underlines the notion that very long chain ceramides might cause muscle deterioration. We believe inhibiting the enzymatic activity of Cers1 (P053) as compared to degrading Cers1 transcripts is a more efficient strategy to reduce ceramide levels. However, we cannot completely rule out multi-organ, systemic effects of P053 treatment beyond its direct effect on muscle. We added these details in the discussion of the revised manuscript (see page 8 of the revised manuscript).

      (c) The authors talk about a possible compensation of CERS2 isoform but they never showed mRNA expression levels or CERS2 protein levels aner treatment. Is CERS2 higher expressed when CERS1 is downregulated in skeletal muscle?

      We appreciate the suggestion of the reviewer. We found no change in Cers2 mRNA levels upon Cers1 inhibition in mouse C2C12 myoblasts (Author response image 1I). We would like to point out that mRNA abundance might not be the optimal measurement for enzymes due to enzymatic activities. Therefore, we think metabolite levels are a better proxy of enzymatic activity. It should also be pointed out that “compensation” might not be an accurate description as sphingoid base substrate might simply be more available upon Cers1KD and hence, more substrate might be present for Cers2 to synthesize very long chain ceramides. This “re-routing” has been previously described in the literature and hypothesized to be related to avoid toxic (dh)sphingosine accumulation (PMID: 30131496). Therefore, we changed the wording in the revised manuscript to be more precise.

      (d) Force measurement of AAV CERS1 downregulated muscles could be a plus for the study (assay function of contractility)

      In the current study we measured grip strength in mice, which had previously been shown to be a good proxy of muscle strength and general health (PMID: 31631989). Indeed, our results of reduced muscle grip strength are in line with previous work that shows reduced contractility in muscles of Cers1 deficient mice (PMID: 31692231).

      (e) How are degradation pathways affected by the downregulation of CERS1. Is autophagy/mitophagy affected? How is mTOR and protein synthesis affected? There is a recent paper that showed that CerS1 silencing leads to a reduction in C18:0-Cer content, with a subsequent increase in the activity of the insulin pathway, and an improvement in skeletal muscle glucose uptake. Could be possible that CERS1 downregulation increases mTOR signalling and decreases autophagy pathway? Autophagic flux using colchicine in vivo would be useful to answer this hypothesis

      Cers1 in skeletal muscle has indeed been linked to metabolic homeostasis (see PMID: 30605666). In line with their finding in young mice we also find reduced fat mass upon P053 treatment in aged mice (Author response image 1A-B). We also looked into mitochondrial bioenergetics upon blocking Cers1 with P053 treatment using an O2k oxygraphy (Author response image 1J-L). Results show that Cers1 inhibition in mouse muscle cells increases mitochondrial respiration, similar to what has been shown before (PMID: 30131496). However, we also found that reactive oxygen species production in mouse muscle cells is increased upon P053 treatment, suggesting the presence of dysfunctional mitochondria upon inhibiting Cers1 with P053.We next looked into the mitophagy/autophagy degradation pathways suggested by the reviewer and do not find convincing evidence supporting that Cers1 has a major impact on autophagy or mitophagy derived gene sets in mice treated with shRNA against Cers1, or the Cers1 pharmacological inhibitor P053 (Author response image 1M).

      We then assessed the effect of Cers1 inhibition on transcripts levels related to the mTORC1/protein synthesis, as suggested by the reviewer. Cers1 knockdown in differentiating mouse muscle cells showed only a weak trend to reduce mTORC1 and its downstream targets (new Fig S4A). In line with this, there was no notable difference in protein synthesis in differentiating, Cers1 deficient mouse C2C12 myoblasts as assessed by L-homopropargylglycine (HPG) amino acid labeling using confocal microscopy (new Fig S4B) or FACS analyses (new Fig S4C). However, Cers1KD increased transcripts related to the myostatin-Foxo1 axis as well as the ubiquitin proteasome system (e.g. atrogin-1, MuRF1) (new Fig S4D), suggesting Cers1 inhibition increases protein degradation. We added these details to the revised manuscript on page 7. We recently implicated the ceramide pathway in regulating muscle protein homeostasis (PMID: 37196064). Therefore, we assessed the effect of Cers1 inhibition with the P053 pharmacological inhibitor on protein folding in muscle cells using the Proteostat dye that intercalates into the cross-beta spine of quaternary protein structures typically found in misfolded and aggregated proteins. Interestingly, inhibiting Cers1 further increased misfolded proteins in C2C12 mouse myoblasts expressing the Swedish mutation in APP and human myoblasts isolated from patients with inclusion body myositis (Author response imageure 1N). These findings suggest that deficient Cers1 might upregulate protein degradation to compensate for the accumulation of misfolded and aggregating proteins, which might contribute to impaired muscle function observed upon Cers1 knockdown. Further studies are needed to disentangle the underlying mechanstics.

      (f) The balances of ceramides have been found to play roles in mitophagy and fission with an impact on cell fate and metabolism. Did the authors check how are mitochondria morphology, mitophagy or how dynamics of mitochondria are altered in CERS1 knockdown muscles? (fission and fusion). There is growing evidence relating mitochondrial dysfunction to the contribution of the development of fibrosis and inflammation.

      Previously, CERS1 has been studied in the context of metabolism and mitochondria (for reference, please see PMID: 26739815, PMID: 29415895, PMID: 30605666, PMID: 30131496). In summary, these studies demonstrate that C18 ceramide levels are inversely related to insulin sensitivity in muscle and mitochondria, and that Cers1 inhibition improves insulin-stimulated suppression of hepatic glucose production and reduced high-fat diet induced adiposity. Moreover, improved mitochondrial respiration, citrate synthase activity and increased energy expenditure were reported upon Cers1 inhibition. Lack of Cers1 specifically in skeletal muscle was also reported to improve systemic glucose homeostasis. While these studies agree on the effect of Cers1 inhibition on fat loss, results on glucose homeostasis and insulin sensitivity differ depending on whether a pharmacologic or a genetic approach was used to inhibit Cers1. The current manuscript describes the effect of CERS1 on muscle function and myogenesis because these were the most strongly correlated pathways with CERS1 in human skeletal muscle (Fig 1C) and impact of Cers1 on these pathways is poorly studied, particularly in the context of aging. Therefore, we would like to refer to the mentioned studies investigating the effect of CERS1 on mitochondria and metabolism.

      (2) C.elegans data:

      (a) The authors checked maternal RNAi protocol to knockdown lagr-1 and showed alteration of muscle morphology at day 5. They also give pharmacological exposure of P053 drug at L4 stage. Furthermore, the authors also used a transgenic ortholog lagr-1 to perform the experiments. All of them were consistent showing a reduced movement. It would be important to show rescue of the muscle phenotype by overexpressing CERS1 ortholog in knockdown transgenic animals.

      We used RNAi to knockdown the Cers1 orthologue, lagr-1, in C.elegans. Therefore, we do not have transgenic animals. Overexpressing lagr-1 in the RNAi treated animals would also not be possible as the RNA from the overexpression would just get degraded.

      (b) The authors showed data about distance of C.elegans. It would be interesting to specify if body bends, reversals and stillness are affected in RNAi and transgenic Knockdown worms.

      As suggested, we measured trashing and stillness as suggested by the reviewer and found reduced trashing (new Fig S5B) and a trend towards an increase in stillness (Author response image 1O) in P053 treated worms on day 5 of adulthood, which is the day we observed significant differences in muscle morphology and movement (Fig 4D-E, Fig S5A). These data are now included in the revised manuscript.

      (c) Is there an effect on lifespan extension by knocking down CERS1?

      We performed two independent lifespan experiments in C.elegans treated with the Cers1 inhibitor P053 and found reduced lifespan in both replicate experiments (for second replicate, see Author response image 1P). We added these data to the revised manuscript as new Fig 4H.

      How do the authors explain the beneficial effect of sptlc1 inhibition on healthy aging muscle? Discuss more during the article if there is no possible explanation at the moment.

      We believe that blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore is more beneficial for muscle aging. Our current work suggests that at least a significant part of Sptlc1-KD benefits might stem from blocking very long chain ceramides. While SPTLC1 and CERS2 revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects, which is also visible in Fig 1e and Fig 1f of PMID: 37118545. In the current study, we show that Cers1 inhibition indeed exacerbates aging defects in myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. The fact that the effect of Cers1 on inhibiting muscle differentiation is dependent on the clearance of Cers2-derived C24-ceramides suggests that reducing very long chain ceramides might be crucial for healthy muscle aging. We added details to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      This paper reports how mycobacterial cAMP level is increased under stressful conditions and that the increase is important in the survival of the bacterium in animal hosts.

      Strengths:

      The authors show that under different stresses the response regulator PhoP represses a phosphodiesterase (PDE) that degrades cAMP specifically. Identification of a PDE specific to cAMP is significant progress in understanding Mtb pathogenesis. An increase in cAMP apparently increases bacterial survival upon infection. On the practical side, the reduction of cAMP by increasing PDE can be a means to attenuate the growth of the bacilli. The results have wider implications since PhoP is implicated in controlling diverse mycobacterial stress responses and many bacterial pathogens modulate host cell cAMP level. The results here are straightforward, internally consistent, and of both theoretical and applied interests.

      We thank the reviewers for these extremely encouraging comments.

      Weaknesses:

      Repression of PDE promoter by binding of phosphorylated PhoP could have been shown at higher precision. The binding is now somewhere along a roughly 500 bp region. Although the regulation of PDE is shown to be by transcriptional repression only, it has been described as a homeostatic mechanism. The latter would have required a demonstration of both repression and activation by negative feedback.

      We agree. We have now performed EMSA (Electrophoretic Mobility Shift Assay) experiments and included the data showing DNA binding of PhoP to the upstream regulatory region of rv0805 (rv0805up) as a supplemental figure (see Figure 2-figure supplement 1). The supplemental figure, figure caption, and the relevant results have been adjusted accordingly in the revised manuscript.

      Further, as recommended by the reviewer we have now removed the term ‘homeostatic mechanism’ and rephrased it with ‘maintenance of cAMP level’ in the manuscript.

      Response to Reviewers’ comments

      Reviewer #1:

      The authors have used homeostasis inappropriately. Homeostasis usually requires negative feedback (a clear example is the regulation of Lambda prm promoter). Here, there is no feedback from changes in PDE or cAMP level to their synthesis. Homeostasis does not belong to this paper anywhere.

      As recommended by the reviewer, we have now removed “homeostasis” from the manuscript and mostly replaced it with “maintenance of cAMP level” in the revised manuscript.

      The authors have frequently used adverbs at the beginning of a sentence, such as Notably (l.240, 272, 376), Importantly (l.66, 213), More importantly (l.134), Remarkably (l.264), Interestingly (l.115,301), Intriguingly (l.344), unambiguously (l.347), etc. The use of these words is generally counter-productive. The authors should scan the ms. to eliminate them as far as possible. The sentences would read more clearly and become more impactful.

      Following reviewer’s recommendation, we have now eliminated most of the adverbs, mostly used at the beginning of sentences, in the revised manuscript.

      Specific comments

      (1) L.1: "maintenance of homeostasis" or increasing cAMP level.

      As suggested by the reviewer, we have now replaced “maintenance of cAMP homeostasis” with “maintenance of cAMP level”.

      (2) L.27: mechanism or reason; varying or various.

      As recommended by the reviewer, we have now replaced “mechanism” with “reason” and the word “varying” is deleted while incorporating suggested changes in the abstract.

      (3) L.28-29: The logic of connecting PhoP to cAMP doesn't follow well. The logic is much better in l.54, l.112-5 and l.130.

      We thank the reviewer for this suggestion. We have now modified the statement within the ‘abstract’ in the revised manuscript (duplicated below):

      “cAMP is one of the most widely used second messengers which impacts on a wide range of cellular responses in microbial pathogens including M. tuberculosis. Herein, we hypothesized that intra-mycobacterial cAMP level could be controlled by the phoP locus since the major regulator plays a key role in bacterial response against numerous stress conditions.”

      (4) L.30: discovers or reveals (?). Also, in l.101.

      As recommended by the reviewer, we have now replaced ‘discovers’ with ‘reveals’ in the Abstract and ‘uncovered’ with ‘revealed’ in the Introduction section of the manuscript.

      (5) L.31: Delete "The most - - derived". It is not obvious what most fundamental means here. I suggest: We find that PhoP-dependent ---involves specific binding of the regulator---PDE gene.

      As recommended by the reviewer, we have modified the statement (duplicated below): “In keeping with these results, we find specific recruitment of the regulator within the promoter region of rv0805 PDE, and absence of phoP or ectopic expression of rv0805 independently accounts for elevated PDE synthesis leading to depletion of intra-mycobacterial cAMP level.”

      (6) L.36: --pathway decreases cAMP level, stress tolerance, and survival of the bacilli.

      As recommended by the reviewer, we have now modified the statement (duplicated below): “Thus, genetic manipulation to inactivate PhoP-Rv0805-cAMP pathway decreases cAMP level, stress tolerance, and intracellular survival of the bacilli.

      (7) L.41: 'keeps encountering" or encounters?

      As suggested by the reviewer, we have replaced ‘keeps encountering’ with ‘encounters’ in the ‘Introduction’ section of the revised manuscript.

      (8) L.61: responds, carries.

      Our apologies for the embarrassing grammatical mistakes. We have rectified these errors in the revised manuscript.

      (9) L.67: you mean burst in synthesis level, not burst of cAMP itself.

      To improve clarity, we have now modified the statement in the revised manuscript (duplicated below): “Agarwal and colleagues had shown that burst in synthesis of bacterial cAMP upon infection of macrophages, improved bacterial survival by interfering with host signalling pathways (Agarwal et al., 2009)”

      Reference

      Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR (2009) Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 460: 98-102

      (10) L.77: Change Off to Of.

      We are sorry for the inaccuracy. The suggested change has been made to the text.

      (11) L.83: Did not discuss "degradation" earlier.

      Following reviewer’s recommendation, we have now modified the statement in the revised manuscript (duplicated below).

      “Together, these results strongly suggest that a balance between cAMP synthesis by adenylate cyclases and cAMP degradation by phosphodiesterases contributes to rapid adaptive response of mycobacteria in a hostile intracellular environment (Johnson and McDonough, 2018; McDonough and Rodriguez, 2011).”

      Reference

      Johnson RM, McDonough KA (2018) Cyclic nucleotide signaling in Mycobacterium tuberculosis: an expanding repertoire. Pathog Dis 76 (5)

      McDonough KA, Rodriguez A (2011) The myriad roles of cyclic AMP in microbial pathogens: from signal to sword. Nature reviews Microbiology 10: 27-38

      (12) L.95: Isn't PhoPR a two-component signal transduction system, the terminology that is more specific than a two-protein regulatory system?

      As recommended by the reviewer, we have replaced “two protein regulatory system” with more specific “two-component signal transduction system” in the revised manuscript.

      (13) L.124: check-point prevents things from happening. Here the mechanism you found allows growth and survival.

      We agree. As recommended by the reviewer, we have now modified the sentence in the revised manuscript (duplicated below).

      “Together, the newly identified mechanism of regulation of cAMP level allows intraphagosomal survival and growth program of mycobacteria.”

      (14) L.132: why not say directly-"---under normal, and NO and acid stress conditions (Fig. 1A).

      As recommended by the reviewer, we have now deleted the first part of the sentence and directly stated that “we compared cAMP levels………. under normal, NO and acidic stress conditions” (duplicated below).

      “We compared cAMP levels of WT and phoPR-KO (lacking both phoP and phoR), grown under normal, NO stress and acid stress conditions (Fig. 1A).”

      (15) L.134: The complementation is quite variable. Also true in Fig. 2A. If no simple answer, you can say- cAMP values increased in complemented cells, although to a variable extent, for reasons unknown.

      We agree with the reviewer. We have now incorporated new text in the ‘Results’ section of the revised manuscript (duplicated below):

      “A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress conditions (Khan et al., 2022).”

      (16) L.154: You rather not say "conclude" and "most likely" at the same time. How about replacing "we conclude" with suggests? In that case, no need to say "most likely". Also, in l.306-7 & l.322-3.

      We thank the reviewer for these suggestions. We have now modified the statements in the revised manuscript (duplicated below).

      “We suggest that lower cAMP level of the mutant is not due to its higher efficacy of cAMP secretion.”

      Following reviewer’s recommendation, we have incorporated similar changes in two other places of the ‘Results’ section of the revised manuscript.

      (17) L.161: introduce both the acronyms here and not in l.162.

      Following reviewer’s recommendation, we have made the suggested changes.

      (18) L.164: Second, (to be in line with First).

      We have made the suggested change.

      (19). Fig. 2C: There are no black and white bars. This is an important figure because the results appear in the abstract. The signal change from pH 7 to 4.5 is not much. An independent approach would have been desirable. If it were E. coli, I would have suggested beta-gal assay or in vivo footprints. Is a PhoP binding site recognizable in the promoter region of rv0805?

      We apologize for the inaccuracy. We have corrected it in the revised manuscript. Also, we have now carried out DNA binding assays, and included the EMSA data of rv0805 upstream regulatory region binding to phosphorylated PhoP (P~PhoP) as a supplemental figure (Figure 2-figure supplement 1A-B). In this figure, we have also incorporated our results on the likely PhoP binding site within rv0805up. The new figure, figure caption and the relevant results have been adjusted accordingly in the revised manuscript.

      (20) L.209: ORFs; also delete "of growth" from the sentence.

      The suggested changes were made to the text.

      (21) L.213: Delete Importantly and change "failed to" to 'did not' (since you did not motivate the expectation earlier, it is better to state the results in an unbiased way).

      As recommended by the reviewer, both changes were included in the revised manuscript.

      (22) L.217: The requirement of PhoR is a new result - why say "confirm". Change it to indicate. Also, delete "indeed" here and from L.233.

      As recommended by the reviewer, both changes were included in the revised manuscript.

      (23) L.224: Are the results in Fig 3-S1A under inducing conditions?

      The results shown in Fig 3-S1A are not under inducing conditions of expression. For better clarity, we have modified the sentence describing Figure 3-figure supplement 1A (duplicated below).

      “rv0805 ORF was cloned within the multicloning site of integrative pSTki (Parikh et al., 2013) between EcoRI and HindIII sites under the control of Pmyc1tetO promoter, and expression of rv0805 under non-inducing condition was verified by determining the mRNA level (Figure 3 - figure supplement 1A).

      Reference:

      Parikh et al (2013) Development of a new generation of vectors for gene expression, gene replacement, and protein-protein interaction studies in mycobacteria. Applied and environmental microbiology 79: 1718-1729

      (24) L.225: ---cAMP level. Add (Fig. 3C) at the end of the next sentence.

      As recommended by the reviewer, both the suggested changes were made to the revised text.

      (25) L.231: Delete "Most importantly"- you didn't specify what are other less important results.

      We agree. We have now deleted “most importantly” from the sentence in the revised text.

      (26) L.243 & 254: Change homeostasis to level? Here you are showing mechanisms that can change cAMP level. Homeostasis here would mean how fluctuations in cAMP level are adjusted, usually requiring negative feedback.

      As recommended by the reviewer, ‘homeostasis’ was replaced with ‘level’ in both places.

      (27) L.256: stress response or stress? Also, in l.272

      We are sorry for the inaccuracy. We have corrected these errors in the revised version of the manuscript.

      (28) L.259: Change "maintenance of homeostasis" to 'repressing the rv0805 PDE gene'. It is safer to use a fact-based title. In this section, direct measurement of rv0805 mRNA, and/or cAMP levels in different genetic backgrounds seem desirable.

      We agree. As recommended by the reviewer, we have modified the title of the ‘Results’ section in the revised manuscript (duplicated below).

      “PhoP contributes to mycobacterial stress tolerance and intracellular survival by repressing the rv0805 PDE expression.”

      Please note that direct measurements of rv0805 mRNA and cAMP levels are part of Fig. 3 and Figure 3- figure supplement 1A, respectively.

      (29) Fig, 4A: White and grey symbols are not easily discriminated without zooming. Use color for phoPR-KO.

      We agree. We have now indicated the phoPR-KO in blue in the revised Fig. 4.

      (30) L.264: Delete remarkable or explain what is so remarkable. Aren't the results expected- the PDE level would go up in both cases. Direct measurement of PDE /cAMP levels would take the mystery out of the results.

      As recommended by the reviewer, we have deleted ‘remarkably’ in the revised text. We have measured cAMP and PDE expression levels of the four strains in Fig. 3 and Figure 3-figure supplement 1.

      (31) L.273: --suggesting a role of ---

      We have modified this sentence in the revised version of the manuscript (duplicated below).

      “A previous study had reported that phoP-deleted mutant strain was more sensitive to Cumene Hydrogen Peroxide (CHP), suggesting a role of PhoP in regulating mycobacterial stress response to oxidative stress (Walters et al., 2006).”

      Reference:

      Walters et al. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. Mol Microbiol 60: 312-330

      (32) L.275: Delete "transcriptome". CHP sensitivity alone doesn't speak for transcriptome.

      As suggested by the reviewer, we have deleted “transcriptome”. Also, please see our response to the previous comment (above).

      (33) Fig. 4D and E: % Colocalization in the Merge panels is not much different among the four strains tested (to an untrained eye). Can the results be explained to readers not used to in vivo studies?

      As recommended by the reviewer, we have now incorporated new text to explain the in vivo experiment (duplicated below).

      “In this assay, WT-H37Rv inhibits phagosome maturation, whereas phagosomes with phoPR-KO mature into phagolysosomes (Anil Kumar et al., 2016).”

      Further, for better clarity of the results shown in Fig. 4D, we have (a) increased size of the figure to highlight the difference in the ‘merge’ panel; (b) included “white arrowheads” in the merge panels of Fig. 4D to indicate auramine labeled mycobacteria, which either have inhibited or facilitated trafficking into lysosomes, and finally (c) incorporated method used to calculate percent co-localization in greater details in the ‘Material and Methods’ section of the revised manuscript.

      Reference

      Anil Kumar et al. (2016) EspR-dependent ESAT-6 secretion of Mycobacterium tuberculosis requires the presence of virulence regulator PhoP. J Biol Chem. 291, 19018-19030

      (34) L.275-6: Delete "next" (also in l.347) and "Note that". In this paragraph, I was expecting some explanation on how phoPR-KO and WT-Rv0805 are behaving similarly. Even if the reason is not known, it should be mentioned.

      The suggested changes have been made to the text. Also, as recommended by the reviewer, we have included the following text in the revised manuscript (duplicated below):

      “Together, these results reveal similar behaviour of phoPR-KO, and WT-Rv0805 by demonstrating a comparably higher susceptibility of these strains to acidic pH and oxidative stress relative to WT bacteria and indicate a link between intra-mycobacterial cAMP level and bacterial stress response. Collectively, these data suggest that at least one of the mechanisms by which PhoP contributes to global stress response is attributable to maintenance of cAMP level.”

      (35) L.281: ---WT and indicate a link between cAMP level and stress response in mycobacteria. (No mention of homeostasis).

      The suggested change has been made to the revised text. Please see above our response to point # 34.

      (36) L.288, 290: No Thus and no clearly.

      Both the suggested changes have been made to the text.

      (37) L.297: Can you be more direct and state --is due to reduced cAMP level?

      As recommended by the reviewer, we have now modified the sentence to make it more direct in the revised manuscript (duplicated below):

      “Together, our findings facilitate an integrated view of our results, suggesting that higher susceptibility of WT-Rv0805 to stress conditions, is attributable to its reduced cAMP level.”

      (38) L.307: May delete "most likely----homeostasis". cAMP is not discussed here. The same deletion is desired in l.324.

      We agree. As recommended by the reviewer, we have now modified the relevant texts in the revised manuscript. These are duplicated below.

      “From these results, we suggest that ectopic expression of rv0805 impacts phagosome maturation arguing in favour of a role of PhoP in influencing phagosome-lysosome fusion in macrophages.”

      “Thus, we suggest that one of the reasons which accounts for an attenuated phenotype of phoPR-KO in both cellular and animal models is attributable to PhoP-dependent repression of rv0805 PDE activity, which controls mycobacterial cAMP level.”

      (39) L.342: cAMP level is regulated remains---

      The suggested change has been made to the revised text (duplicated below):

      “Although many bacterial pathogens modulate host cell cAMP level as a common strategy, the mechanism of regulation of mycobacterial cAMP level remains unknown.”

      (40) L.373: tone down "most fundamental". It is not obvious what is so profound about a stress-response system that depends on PhoP also depends on PhoR. OR justify what is most fundamental about it.

      We agree. Following reviewer’s recommendation, we have modified the text in the revised manuscript (duplicated below):

      “In keeping with these results, we find that PhoP-dependent rv0805 expression requires PhoR (Figs. 3A-B), the cognate kinase which activates PhoP in a signal-dependent manner (Gupta et al., 2006; Singh et al., 2023).”

      References:

      Gupta et al. (2006) Transcriptional autoregulation by Mycobacterium tuberculosis PhoP involves recognition of novel direct repeat sequences in the regulatory region of the promoter. FEBS Letters 580, 5328-5338.

      Singh et al. (2023) Dual functioning by the PhoR sensor is a key determinant to Mycobacterium tuberculosis virulence. PLoS Genetics 19(12): e1011070.

      (41) L.395: delete correspondingly (?)

      The suggested change has been made to the text.

      (42) L.396: Delete "appear to" and "somewhat". The uncertainty is already implied in "suggest". The evidence that ectopic expression of rv0805 is functionally equivalent to phoP deletion is quite clear in this paper and not saying that clearly is confusing.

      We agree with the reviewer. The suggested changes have been made to the revised text (duplicated below):

      “Thus, our results suggest that ectopic expression of rv0805 is functionally equivalent to deletion of the phoP locus.”

      (43) L.401: --over-expressing bacilli, induction level of rv0805 expression was significantly different in Matange et al and our studies. The next sentence is also very wordy.

      We have made changes to the text to address the reviewer’s concern. Also, the next sentence has been rewritten (duplicated below).

      “Although both studies were performed with rv0805 over-expressing bacilli, the fact that important differences in the expression of PDEs, in this study (Matange et al., 2013) and in our assays - yielding significantly different levels of rv0805 expression - most likely account for this discrepancy. While we cannot rule out the possibility of cleavage of other cyclic nucleotides by Rv0805 (Keppetipola & Shuman, 2008; Shenoy et al., 2007; Shenoy et al., 2005), consistent with a previous study our results correlate rv0805 expression with intra-mycobacterial cAMP level (Agarwal et al., 2009).”

      References:

      Matange et al. (2013) Overexpression of the Rv0805 phosphodiesterase elicits a cAMP-independent transcriptional response. Tuberculosis (Edinb) 93: 492-500.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity. J Biol Chem 283: 30942-30949

      Shenoy et al. (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis. Journal of molecular biology 365: 211-225

      Shenoy et al. (2005) The Rv0805 gene from Mycobacterium tuberculosis encodes a 3',5'-cyclic nucleotide phosphodiesterase: biochemical and mutational analysis. Biochemistry 44: 15695-15704

      Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR (2009) Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 460: 98-102

      (44) L.409: To avoid saying "conclude" and "most likely" at the same time, can you start the sentence thus: 'We infer that Pho-----rv0805 is a---.

      We agree. We have made suggested changes to the text. The modified sentence is duplicated below:

      “We infer that PhoP-dependent regulation of Rv0805 is a critical regulator of intra-mycobacterial cAMP level.”

      (45) L.424. Delete "According to this model". In the preceding sentence, the subject is results, not model. This whole paragraph needs to be rewritten in fewer lines. The shorter the summary statement, the greater would be its impact (less is more here). I would delete the red circles from the figure- it appears that in the repressed state, you are making more products. Replace the circles with an arrow. The legend could be "Increased cAMP level and effective stress response" and "Decreased cAMP---and reduced---.

      We thank the reviewer for these suggestions. Following reviewer’s recommendations, we have made numerous changes and rewritten the paragraph in the revised manuscript (duplicated below):

      “In summary, upon sensing low acidic pH as a signal PhoR activates PhoP, P~PhoP binds to rv0805 upstream regulatory region and functions as a specific repressor of Rv0805. Therefore, we observed (a) a reproducibly lower level of cAMP in phoPR-KO relative to WT-H37Rv, (b) a significantly reduced expression of rv0805 in WT-H37Rv, grown under acidic pH relative to normal conditions, and (c) comparable cAMP levels in phoPR-KO and WT-Rv0805. This is why the two strains remain ineffective to mount an appropriate stress response, most likely due to their inability to coordinate regulation of gene expression because of dysregulation of intra-mycobacterial cAMP level. However, without uncoupling regulatory control of PhoPR and rv0805 expression, we cannot confirm that dysregulation of cAMP level accounts for virulence attenuation of phoPR-KO. Given the fact that rv0805-depleted M. tuberculosis is growth attenuated in vivo (McDowell et al., 2023), paradoxically ectopic expression of rv0805 leads to dysregulated metabolic adaptation, thereby resulting in reduced stress tolerance and intracellular survival.”

      Also, the suggested changes have been incorporated in Fig. 6 and the figure caption.

      Reference

      McDowell JR, Bai G, Lasek-Nesselquist E, Eisele LE, Wu Y, Hurteau G, Johnson R, Bai Y, Chen Y, Chan J et al (2023) Mycobacterial phosphodiesterase Rv0805 is a virulence determinant and its cyclic nucleotide hydrolytic activity is required for propionate detoxification. Mol Microbiol 119: 401-422

      (46) L.458 & 500: ---was used to transform.

      Following reviewer’s recommendation, the suggested changes were made to the text in the Materials and Methods section of the revised manuscript.

      (47) L.460: --- antibiotics plates.

      Both suggested changes were made to the text.

      (48) L.466-7: --they were transferred-pH 4.5) and grown for further-

      We thank the reviewer for these suggestions. The suggested changes were made to the text.

      (49) L.486: ---full-length ORFs of interest were---

      The suggested changes were incorporated in the revised manuscript.

      (50) L.497: The RNAs were 20 nt long and complementary---

      As recommended by the reviewer, we have modified the text in the revised manuscript (duplicated below).

      “The RNAs were 20 nt long and complementary to the non-template strand of the target gene.”

      Reviewer #2:

      (1) Rephrase this sentence in the abstract: “Because growing evidence connects PhoP with varying stress response, we hypothesized that the level of 3’,5’ cAMP, one of the most widely used second messengers, was regulated by the phoP locus, linking numerous stress responses with cAMP production”.

      As recommended by the reviewer, we have now rewritten the sentence. The modified text is incorporated in the revised manuscript (duplicated below):

      “cAMP is one of the most widely used second messengers, which impacts on a wide range of cellular responses in microbial pathogens including M. tuberculosis. Herein, we hypothesized that intra-mycobacterial cAMP level could be controlled by the phoP locus since the major regulator plays a key role in bacterial responses against numerous stress conditions.”

      Also, please see our response to specific comments #1-3 of Reviewer 1.

      (2) Line 134: please describe the complementation strain features as it is mentioned for the first time (plasmid, copy number, promoter etc.) in the manuscript. Especially under NO stress what could be the authors' justification regarding the high cAMP concentration in the complementation strain?

      As recommended by the reviewer, the details of construction of the complemented strain have been incorporated in the ‘Materials and Methods’ section of the revised manuscript (duplicated below):

      “To complement phoPR expression, pSM607 containing a 3.6- kb DNA fragment of M. tuberculosis phoPR including 200-bp phoP promoter region, a hygromycin resistance cassette, attP site and the gene encoding phage L5 integrase, as detailed earlier (Walters et al., 2006) was used to transform phoPR mutant to integrate at the L5 attB site.”

      To address the reviewer’s other concern, we have now included the following sentence in the ‘Results’ section of the revised manuscript (duplicated below):

      “A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress condition (Khan et al., 2022).”

      Reference:

      Khan et al. (2022) Convergence of two global regulators to coordinate expression of essential virulence determinants of Mycobacterium tuberculosis. eLife 2022, 11:e80965.

      (3) In Figure 1C, it is a bit confusing to see the numbers 1,2,3 and 4 and nothing is referred to these numbers in the figure legend so it's better to remove them.

      We agree with the reviewer. We have now removed the lane numbers from the figure (Fig. 1C) in the revised manuscript.

      (4) Line 852: rephrase it "insignificantly different".

      The suggested change has been made to the text. The modified text is incorporated in the manuscript (duplicated below):

      “Note that the difference in expression levels of rv0805 between WT and phoPR-KO was significant (p<0.01), whereas the fold difference in mRNA level between WT and the complemented mutant (Compl.) remains nonsignificant (not indicated).”

      (5) Line198-200: There are no open/black bars, they all are coloured bars. Correct the same. The significance test should be done for the same gene (suppose rv0805 up) in different pH conditions. Right now, it is not revealing anything and misleading.

      We apologize for the inaccuracy. We have now rectified the error. As recommended by the reviewer, Fig. 4C was modified, and the significance tests were carried out between samples involving identical promoter enrichments under different pH conditions. The modified figure, figure legend, and the relevant results have been adjusted accordingly in the revised manuscript.

      (6) Line 213: Is there any difference between this complementation strain (phoPR-KO:: phoPphoR with the one used in Figure 1A, 1B, and 2A? If yes, then please describe it.

      The same complemented mutant strain, which has been described in the ‘Materials and Methods’ section of the revised manuscript, was used in the experiments described in Fig. 1A, Fig.1B and Fig. 2A.

      (7) Line 223: Please mention the copy number and promoter of the vector construct.

      As recommended by the reviewer, we have now mentioned the promoter of the vector and incorporated new text with regard to copy number of the expression vector in the revised manuscript (duplicated below).

      “Although copy number of episomal vectors with pAl5000 origin of replication (oriM) have been reported to be 3 by Southern hybridization (Ranes et al, 1990), in this case wild-type and mutant Rv0805 proteins were expressed from single-copy chromosomal integrants (Parikh et al., 2013).”

      References

      Ranes et al., (1990) Functional analysis of pAL5000, a plasmid from Mycobacterium fortuitum: construction of a "mini" mycobacterium-Escherichia coli shuttle vector. J Bacteriol 172: 2793-2797

      Parikh et al., (2013) Development of a new generation of vectors for gene expression, gene replacement, and protein-protein interaction studies in mycobacteria. Applied and environmental microbiology 79: 1718-1729

      (8) Figure 3 - Figure Supplement 1: not sure why the authors measured mRNA levels of rv1357 and rv2387? These genes were not overexpressed!

      The mRNA levels of rv1357 and rv2387 were measured to show that overexpression of either the wild-type or mutant Rv0805 did not influence expression of other PDEs like Rv1357 and Rv2387. We have now mentioned it explicitly in the revised manuscript (duplicated below).

      “In contrast, other PDE encoding genes (rv1357 and rv2387), under identical conditions, demonstrate comparable expression levels in WT-H37Rv and rv0805 over-expressing strains.”

      (9) Line 234: Wrong interpretation it should be PDE mRNA levels in WT-Rv0805 and WT-Rv0805M.

      As recommended by the reviewer, we have now modified the statement to improve clarity (duplicated below).

      “The corresponding mRNA levels of PDEs (wild-type and the mutant) are over-expressed approximately 4.5-6 -fold relative to the genomic rv0805 level of WT-H37Rv (Figure 3-figure supplement 1A).”

      (10) Line 237: Remove the sentence "Thus, we conclude......identical expression strategy", you have already talked about why phosphodiesterase activity is crucial for cAMP concentration and it is well understood.

      Following reviewer’s recommendation, we have now removed the sentence from the revised manuscript.

      (11) Figure 3E: Authors should comment on why the cAMP concentration is not significantly changed even though the mRNA level changes are drastic (~90%). How do you correlate that? Is it because of other PDEs?

      We agree. As suggested by the reviewer, we have now incorporated new text in the revised manuscript (duplicated below).

      “We speculate that effective knocking down of phoP or rv0805 is not truly reflected in the extent of variation of cAMP levels possibly due to the presence of numerous other mycobacterial PDEs.”

      (12) Line 505,506: Is it the translation start site or the transcription start site? Because mRNA level changes are reported.

      It is the translational start sites, and gene-specific small guide RNAs were designed to inhibit mRNA expression.

      (13) Line 292: There is a difference between red and green bars. Authors should do statistical analysis and then comment on whether overexpression of WT and mutant pde are different or similar, to me they are different; also, explain why the WT-Rv0805 strain is different than the phoPR-KO strain in the context of cell wall metabolism.

      As recommended by the reviewer, we have now included statistical significance of the data in the revised version, and modified the text accordingly in the manuscript.

      Also, we included text explaining why WT-Rv0805 is different compared to phoPR-KO strain in the context of cell wall metabolism (duplicated below).

      “Together, these results suggest that both strains expressing wild type or mutant PDEs share a largely similar cell-wall properties and are consistent with (a) a recent study reporting no significant effect of cAMP dysregulation on mycobacterial cell wall structure/permeability (Wong et al., 2023), and (b) role of PhoP in cell wall composition and complex lipid biosynthesis (Walters et al., 2006; Asensio et al., 2006; Goyal et al., 2011).”

      References:

      Wong et al. (2023) Cyclic AMP is a critical mediator of intrinsic drug resistance and fatty acid metabolism in M. tuberculosis. eLife 2023; 12: e81177

      Walters et al. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. Mol Microbiol 60: 312-330

      Asensio et al. (2006) The Virulence-associated Two-component PhoP-PhoR System Controls the Biosynthesis of Polyketide-derived Lipids in Mycobacterium tuberculosis. J Biol Chem 281: 1313-1316.

      Goyal et al. (2011) Phosphorylation of PhoP protein plays direct regulatory role in lipid biosynthesis of Mycobacterium tuberculosis. J Biol Chem 286: 45197-45208

      (14) Line 299-303: Authors should explain how the colocalization % are calculated. Also, in the figure 4D merge panel please highlight the difference.

      As suggested by the reviewer, we have now explained the methodology used to calculate percent colocalization in greater details. Also, we have modified Figure 4D to highlight the difference between samples shown in merge panel. Please see our response to comment # 33 from the Reviewer 1.

      (15) General comment: There are multiple instances where writing needs to be improved.

      We are sorry for the inaccuracies. We have now done thorough editing of the manuscript and made numerous corrections throughout.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports a novel measurement for the chemotactic response to potassium by Escherichia coli. The authors convincingly demonstrate that these bacteria exhibit an attractant response to potassium and connect this to changes in intracellular pH level. However, some experimental results are incomplete, with additional controls/alternate measurements required to support the conclusions. The work will be of interest to those studying bacterial signalling and response to environmental cues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper shows that E. coli exhibits a chemotactic response to potassium by measuring both the motor response (using a bead assay) and the intracellular signaling response (CheY phosporylation level via FRET) to step changes in potassium concentration. They find increase in potassium concentration induces a considerable attractant response, with an amplitude larger than aspartate, and cells can quickly adapt (but possibly imperfectly). The authors propose that the mechanism for potassium response is through modifying intracellular pH; they find both that potassium modifies pH and other pH modifiers induce similar attractant responses. It is also shown, using Tar- and Tsr-only mutants, that these two chemoreceptors respond to potassium differently. Tsr has a standard attractant response, while Tar has a biphasic response (repellent-like then attractant-like). Finally, the authors use computer simulations to study the swimming response of cells to a periodic potassium signal secreted from a biofilm and find a phase delay that depends on the period of oscillation.

      Strengths:

      The finding that E. coli can sense and adapt to potassium signals and the connection to intracellular pH is quite interesting and this work should stimulate future experimental and theoretical studies regarding the microscopic mechanisms governing this response. The evidence (from both the bead assay and FRET) that potassium induces an attractant response is convincing, as is the proposed mechanism involving modification of intracellular pH.

      Weaknesses:

      The authors show that changes in pH impact fluorescent protein brightness and modify the FRET signal; this measurement explains the apparent imprecise adaptation they measured. However, this effect reduces confidence in the quantitative accuracy of the FRET measurements. For example, part of the potassium response curve (Fig. 4B) can be attributed to chemotactic response and part comes from the pH modifying the FRET signal. Measuring the full potassium response curve of the no-receptor mutants as a control would help quantify the true magnitude of the chemotactic response and the adaptation precision to potassium.

      Response: We thank the reviewer for the suggestion. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The measured response may also be impacted by adaptation. For other strong attractant stimuli, the response typically shows a low plateau before it recovers (adapts). However, in the case of Potassium, the FRET signal does not have an obvious plateau following the stimuli. Do the authors have an explanation for that? One possibility is that the cells may have already partially adapted when the response reaches its minimum, which could indicate a different response and/or adaptation dynamics from that of a regular chemo-attractant? In any case, directly measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels would shed more light on the problem.

      Response: We appreciate the reviewer’s insightful questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (Sourjik & Berg, PNAS 99:123, 2002), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1 below, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      Author response image 1.

      The response of the cheRcheB mutant (HCB1382-pVS88) to different concentrations of KCl. The blue solid line denotes the original signal, while the red dots represent the pH-corrected signal. The vertical purple (green) dashed lines indicate the moment of adding (removing) 0.01 mM, 0.1 mM, 0.3 mM, 1 mM, 3 mM, 10 mM and 30 mM KCl, in chronological order.

      There seems to be an inconsistency between the FRET and bead assay measurements, the CW bias shows over-adaptation, while the FRET measurement does not.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      Now we clarified it at line 315.

      The small hill coefficient of the potassium response curve and the biphasic response of the Tar-only strain, while both very interesting, require further explanation since these are quite different than responses to more conventional chemoattractants.

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5) and the biphasic response of the Tar-only strain (Fig. 5C). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspected that this Hill coefficient of slightly less than 1 resulted from the different responses of Tar and Tsr receptors to potassium.

      The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al investigated the biophysical mechanism of potassium-mediated chemotactic behavior in E coli. Previously, it was reported by Humphries et al that the potassium waves from oscillating B subtilis biofilm attract P aeruginosa through chemotactic behavior of motile P aeruginosa cells. It was proposed that K+ waves alter PMF of P aeruginosa. However, the mechanism was this behaviour was not elusive. In this study, Zhang et al demonstrated that motile E coli cells accumulate in regions of high potassium levels. They found that this behavior is likely resulting from the chemotaxis signalling pathway, mediated by an elevation of intracellular pH. Overall, a solid body of evidence is provided to support the claims. However, the impacts of pH on the fluorescence proteins need to be better evaluated. In its current form, the evidence is insufficient to say that the fluoresce intensity ratio results from FRET. It may well be an artefact of pH change. Nevertheless, this is an important piece of work. The text is well written, with a good balance of background information to help the reader follow the questions investigated in this research work.

      In my view, the effect of pH on the FRET between CheY-eYFP and CheZ-eCFP is not fully examined. The authors demonstrated in Fig. S3 that CFP intensity itself changes by KCl, likely due to pH. They showed that CFP itself is affected by pH. This result raises a question of whether the FRET data in Fig3-5 could result from the intensity changes of FPs, but not FRET. The measured dynamics may have nothing to do with the interaction between CheY and CheZ. It should be noted that CFP and YFP have different sensitivities to pH. So, the measurement is likely confounded by the change in intracellular pH. Without further experiments to evaluate the effect of pH on CFP and YFP, the data using this FRET pair is inconclusive.

      Response: We thank the reviewer for pointing this out. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The data in Figure 1 is convincing. It would be helpful to include example videos. There is also ambiguity in the method section for this experiment. It states 100mM KCl was flown to the source channel. However, it is not clear if 100 mM KCl was prepared in water or in the potassium-depleted motility buffer. If KCl was prepared with water, there would be a gradient of other chemicals in the buffer, which confound the data.

      Response: We apologize for the ambiguity. The KCl solution used in this work was prepared in the potassium-depleted motility buffer. We have now clarified this at both lines 116 and 497. We now provided an example video, Movie S1, with the relevant text added at line 123.

      The authors show that the FRET data with both KCl and K2SO4, and concluded that the chemotactic response mainly resulted from potassium ions. However, this was only measured by FRET. It would be more convincing if the motility assay in Fig1 is also performed with K2SO4.

      Response: We thank the reviewer for the suggestion. The aim of comparing the responses to KCl and K2SO4 was to determine the role of chloride ions in the response and to prove that the chemotactic response of E. coli to KCl comes primarily from its response to potassium ions. It is more sensitive to compare the responses to KCl and K2SO4 by using the FRET assay. In contrast, the microfluidic motility assay is less sensitive in revealing the difference in the chemotactic responses, making it difficult to determine the potential role of chloride ions.

      Methods:

      • Please clarify the promotes used for the constitutive expression of FliCsticky and LacI.

      Response: The promoters used for the constitutive expression of LacIq and FliCsticky were the Iq promoter and the native promoter of fliC, respectively (ref. 57).

      Now these have been clarified at line 471.

      • Fluorescence filters and imaging conditions (exposure time, light intensity) are missing.

      Response: Thank you for the suggestion. We have now added more descriptions at lines 535-546: The FRET setup was based on a Nikon Ti-E microscope equipped with a 40× 0.60 NA objective. The illumination light was provided by a 130-W mercury lamp, attenuated by a factor of 1024 with neutral density filters, and passed through an excitation bandpass filter (FF02-438/24-25, Semrock) and a dichroic mirror (FF458-Di02-25x36, Semrock). The epifluorescent emission was split into cyan and yellow channels by a second dichroic mirror (FF509-FDi01-25x36, Semrock). The signals in the two channels were then filtered by two emission bandpass filters (FF01-483/32-25 and FF01-542/32-25, Semrock) and collected by two photon-counting photomultipliers (H7421-40, Hamamatsu, Hamamatsu City, Japan), respectively. Signals from the two photomultipliers were recorded at a sampling rate of 1 Hz using a data-acquisition card installed in a computer (USB-1901(G)-1020, ADlink, New Taipei, Taiwan).

      • Please clarify if the temperature was controlled in motility assays.

      Response: All measurements in our work were performed at 23 ℃. It was clarified at line 496.

      • L513. It is not clear how theta was selected. Was theta set to be between 0 and pi? If not, P(theta) can be negative?

      Response: The θ was set to be between 0 and π. This has now been added at line 581.

      • Typo in L442 (and) and L519 (Koff)

      Response: Thank you. Corrected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) From the motor measurements the authors find that the CW bias over-adapts to a level larger than prestimulus, but this is not seen in the FRET measurements. What causes this inconsistency? Fig. 2D seems to rule out any change in CheY binding to the motor.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      We now clarified it at line 315.

      (2) It would be useful to compare the response amplitude for potassium (Fig. 3C) to a large concentration of both MeAsp and serine. This is a fairer comparison since your work shows potassium acts on both Tar and Tsr. Alternatively, testing a much larger concentration (~10^6 micromolar) at which MeAsp also binds to Tsr would also be useful.

      Response: We thank the reviewer for pointing this out. We have now recalculated the response to potassium by correcting the pH-induced effects on fluorescence intensity of CFP and YFP. The response to 30 mM KCl was 1.060.10 times as large as that to 100 μM MeAsp. The aim of the comparison between the responses to potassium and MeAsp was to provide an idea of the magnitude of the chemotactic response to potassium. The stimulus of 100 μM MeAsp is already a saturating amount of attractant and induces zero-kinase activity, thus using a higher stimulus (adding serine or a larger concentration of MeAsp) is probably not needed. Moreover, a larger concentration (~10^6 micromolar) of MeAsp would also induce an osmotactic response.

      (3) The fitted Hill coefficient (~0.5) to the FRET response curve is quite small and the authors suggest this indicates negative cooperativity. Do they have a proposed mechanism for negative cooperativity? Have similar coefficients been measured for other responses?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspect that this Hill coefficient of slightly less than 1 results from the differing responses of Tar and Tsr receptors to potassium.

      (3a) The authors state a few times that the response to potassium is "very sensitive", but the low Hill coefficient indicates that the response is not very sensitive (at least compared to aspartate and serine responses).

      Response: We apologize for the confusion. We described the response to potassium as “very sensitive” due to the small value of K0.5. This has now been clarified at line 236.

      (3b) Since the measurements are performed in wild-type cells the response amplitude following the addition of potassium may be biased if the cell has already partially adapted. This seems to be the case since the FRET time series does not plateau after the addition of the stimulus. The accuracy of the response curve and hill coefficient would be more convincing if the experiment was repeated with a cheR cheB deficient mutant.

      Response: We thank the reviewer for raising these questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (ref. 46), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      (4) The authors show that the measured imprecise adaptation can be (at least partially) attributed to pH impacting the FRET signal by changing eCFP and eYFP brightness.

      (4a) Comparing Fig. 5C and D, the chemosensing and pH response time scales look similar. Therefore, does the pH effect bias the measured response amplitude (just as it biases the adapted FRET level)?

      Response: We agree with the reviewer that the pH effect on CFP and YFP biases the measured response amplitude. We have now performed the measurement of dose-response curve to potassium for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. The pH effects on CFP and YFP were corrected. The dose-response curve and adaptation curve were recalculated and plotted in Fig. S5.

      (4b) It would help to measure a full response curve (at many concentrations) for the no-receptor strain as a control. This would help distinguish, as a function of concentration, how much response can be attributed to pH impacting the FRET signal versus the true chemotactic response.

      Response: We thank the reviewer for the suggestion. We have now performed the measurements for the no-receptor strain. The impact of pH on CFP and YFP has been corrected. The pH-corrected results, previously in Fig.3-5, are now presented in Fig. 3, Fig. S5 and Fig. 5, respectively.

      (5) The biphasic response of Tar is strange and warrants further discussion. Do the authors have any proposed mechanisms that lead to this behavior? For the 10mM and 30mM KCl measurements there is a repellent response followed by an attractant response for both adding and removing the stimuli, why is this?

      Response: We thank the reviewer for pointing this out. The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5a) The fact that Tar and Tsr are both attractant (after the initial repellant response in Tar) appears to be inconsistent with previous work on pH response (Ref 52, Yang and Sourjik Molecular Microbiology (2012) 86(6), 1482-1489). This study also didn't see any biphasic response.

      Response: We thank the reviewer for pointing this out. The Tar-only strain shows a repellent response to stepwise addition of low concentrations of potassium, specifically less than 10 mM. This is consistent with previous observations of the response of Tar to changes in intracellular pH (refs. 44,45) and also with the work of Yang and Sourjik (new ref. 53), although the work in ref. 53 dealt with the response to external pH change, and bacteria were known to maintain a relatively stable intracellular pH when external pH changes (Chen & Berg, Biophysical Journal (2000) 78:2280-2284). Interestingly, the Tar-only strain exhibits a biphasic response to high potassium concentrations of 10 mM and above. This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA (ref. 56), which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5b) The response of Tar to the removal of sodium benzoate (Fig. S2) seems to be triphasic, is there any explanation for this?

      Response: We thank the reviewer for pointing this out. We have now acknowledged in the legend of Fig. S2 that this response is interesting and warrants further exploration: “The response to the removal of sodium benzoate seems to be a superposition of an attractant and a repellent response, the reason for which deserves to be further explored.”

      (6) Fitting the MWC model leads to N=0.35<1. It is fine to use this as a phenomenological parameter, but can the authors comment on what might be causing such a small effective cluster size for potassium response?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We now refit the MWC model to the pH-corrected dose-response curve, obtaining N of 0.85. We think the small N is due partly to the fact that we are fitting the curve with four parameters: N, Kon, Koff, and fm, while only three features of the sigmoid does-response curve are relevant (the vertical scale, the midpoint concentration, and the slope of the sigmoid). Future experiments may determine these parameters more accurately, but they should not significantly affect the simulation results as long as the wild-type dose-response curve is accurate.

      (7) The results of the modeling are closely related to Zhu et. al. Phys. Rev. Lett. 108, 128101. Is the lag time for large T related to the adaptation time?

      Response: We thank the reviewer for pointing this out. We used a similar framework of modeling as Zhu et. al. The potassium response was also analogous to the chemotactic response to MeAsp. Thus, the results are closely related to Zhu et al. We have now cited Zhu et al. (Ref. 52) and noted this at line 366.

      The lag time for large T is related to the adaptation time. We have now simulated the chemotaxis to potassium for large T with different adaptation time by varying the methylation rate kR. The results are shown in Fig. S8. The simulated lag time decreases with the methylation rate kR, but levels off at high values of kR. Now this has been added at line 603.

      Minor issues:

      • Fig. 1C: should the axis label be y?

      Response: Yes, thank you. Now corrected.

      • Line 519: Koff given twice, the second should be Kon.

      Response: Thank you. Corrected.

      • When fitting the MWC model (Eq. 3 and Fig. 6B) did you fix a particular value for m?

      Response: m was treated as a fitting parameter, grouped in the parameter fm.

      Reviewer #2 (Recommendations For The Authors):

      Minor points: - I suggest explaining the acronyms when they first appear in the text (eg CMC, CW, CCW).

      Response: Thank you. Now they have been added.

      • L144. L242. "decrease" is ambiguous since membrane potential is negative. I understand the authors meant less negative (which is an increase). I suggest to avoid this expression.

      Response: Thank you for the suggestion. Now they have been replaced by “The absolute value of the transmembrane electrical potential will decrease”.

      • For Fig 1b - it says the shaded area is SEM in the text, but SD in the legend. Please clarify.

      Response: Thank you. The annotation in the legend has now been revised as SEM.

      • Fig 1C label of x axis should be "y" instead of "x" to be consistent with Fig 1A.

      Response: Thank you. It has now been revised.

      • In Figure 2, the number of independent experiments as well as the number of samples should be included.

      Response: Thank you. The response in Fig. 2C is the average of 83 motors from 5 samples for wild-type strain (JY26-pKAF131). The response in Fig. 2D is the average of 22 motors from 4 samples for the chemotaxis-defective strain (HCB901-pBES38). They have now been added to the legend.

      • Regarding the attractant or repelling action of potassium and sucrose, it would be important to have a move showing the cells' behaviours.

      Response: We thank the reviewer for the suggestion. We have now provided Movie S1 to show the cells’ behavior to potassium. As shown in Fig. 3B, the chemotactic response to 60 mM sucrose is very small compared to the response to 30 mM KCl. This implies that a noticeable response to sucrose necessitates higher concentrations of stimulation. However, Jerko et al. [Rosko, J., Martinez, V. A., Poon, W. C. K. & Pilizota, T. Proc. Natl Acad. Sci. USA 114, E7969-E7976 (2017).] have shown that high concentrations of sucrose lead to a significant reduction in the speed of the flagella motor. Thus, in a motility assay for sucrose, the osmolarity-induced motility effect may overwhelm the minor repellent-like response.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The weaknesses are the brevity of the simulations, the concomitant lack of scope of the simulations, the lack of depth in the analysis, and the incomplete relation to other relevant work.

      A 1 µs simulation of CCh (Video 1, part 2) shows that m3 (ACHA) is stable, throughout. The DG comparisons, in silico versus in vitro, indicate that 200 ns simulations are sufficient to identify LA versus HA conformational populations. Figure 6-table supplement 1 shows distances. New citations have been added.

      Reviewer #2 (Public Review):

      Weaknesses:

      After carrying out all-atom molecular dynamics, the authors revert to a model of binding using continuum Poisson-Boltzmann, surface area, and vibrational entropy. The motivations for and limitations associated with this approximate model for the thermodynamics of binding, rather than using modern atomistic MD free energy methods (that would fully incorporate configurational sampling of the protein, ligand, and solvent) could be provided. Despite this, the authors report a correlation between their free energy estimates and those inferred from the experiment. This did, however, reveal shortcomings for two of the agonists. The authors mention their trouble getting correlation to experiment for Ebt and Ebx and refer to up to 130% errors in free energy. But this is far worse than a simple proportional error, because -24 Vs -10 kcal/mol is a massive overestimation of free energy, as would be evident if the authors were to instead express results in terms of KD values (which would have an error exceeding a billion fold). The MD analysis could be improved with better measures of convergence, as well as a more careful discussion of free energy maps as a function of identified principal components, as described below. Overall, however, the study has provided useful observations and interpretations of agonist binding that will help understand pentameric ligand-gated ion channel activation.

      The objective of the calculations was to identify structural populations, not to estimate binding free energies. We knew the actual LA and HA energies (for all 4 agonists) from real-world electrophysiology experiments. We conclude that the simple PBSA method worked as a tool for identification because the calculated efficiencies match those from experiments (Figure 4B, Figure 4-Source Data 1). We discuss the mismatches in absolute G in the Results and Discussion. Methods for estimating experimental binding free energies are described in a cited, eLife companion paper. The G ratio relates to agonist efficiency.

      Main points:

      Regarding the choice of model, some further justification of the reduced 2 subunit ECD-only model could be given. On page 5 the authors argue that, because binding free energies are independent of energy changes outside the binding pocket, they could remove the TMD and study only an ECD subunit dimer. While the assumption of distant interactions being small seems somewhat reasonable, provided conformational changes are limited and localised, how do we know the packing of TMD onto the ECD does not alter the ability of the alpha-delta interface to rearrange during weak or strong binding? They further write that "fluctuations observed at the base of the ECD were anticipated because the TMD that offers stability here was absent.". As the TMD-ECD interface is the "gating interface" that is reshaped by agonist binding, surely the TMD-ECD interface structure must affect binding. It seems a little dangerous to completely separate the agonist binding and gating infrastructure, based on some assumption of independence. Given the model was only the alpha and delta subunits and not the pentamer with TMD, I am surprised such a model was stable without some heavy restraints. The authors state that "as a further control we carried out MD simulation of a pentamer docked with ACh and found similar structural changes at the binding pocket compared to the dimer." Is this sufficient proof of the accuracy of the simplified model? How similar was the model itself with and without agonist in terms of overall RMSD and RMSD for the subunit interface and the agonist binding site, as well as the free energy of binding to each model to compare?

      The statement that distant interactions are small is not an "assumption", but rather a conclusion based on data. Mutant cycle analysis of 83 pairs shows (with a few exceptions) non-additivity of free energy change prevails only with separations <~15 A (Fig.3 in Gupta et al 2017). Regardless, the adequacy of dimers and convergence by 200 ns are supported by the calculated and experimental agonist efficiencies match (Figure 4B) and the 1 ms simulation (Video 1 part 2). Apo 200ns simulation of the ECD dimer is now added (Figure 2-figure supplement 2) and the dimer interface seems to be adequate (stable).

      Although the authors repeatedly state that they have good convergence with their MD, I believe the analysis could be improved to convince us. On page 8 the authors write that the RMSD of the system converged in under 200 ns of MD. However, I note that the graph is of the entire ECD dimer, not a measure for the local binding site region. An additional RMSD of local binding site would be much more telling. You could have a structural isomerisation in the site and not even notice it in the existing graph. On page 9 the authors write that the RMSF in Figure S2 showed instability mainly in loops C and F around the pocket. Given this flexibility at the alpha-delta interface, this is why collecting those regions into one group for the calculation of RMSD convergence analysis would have been useful. They then state "the final MD configuration (with CCh) was well-aligned with the CCh-bound cryo-EM desensitized structure (7QL6)... further demonstrating that the simulation had converged." That may suggest a change occurred that is in common with the global minimum seen in cryo EM, which is good, but does not prove the MD has "converged". I would also rename Figure S3 accordingly.

      The description is now changed to “aligns well” with desensitized structure (7QL6.PDB)”. RMSD of not just the binding pocket but the whole ECD dimer is well aligned with first apo (m1) and with desensitized state (m3).

      The authors draw conclusions about the dominant states and pathways from their PCA component free energy projections that need clarification. It is important first to show data to demonstrate that the two PCA components chosen were dominant and accounted for most of the variance. Then when mapping free energy as a function of those two PCA components, to prove that those maps have sufficient convergence to be able to interpret them. Moreover, if the free energies themselves cannot be used to measure state stability (as seems to be the case), that the limitations are carefully explained. First, was PCA done on all MD trajectories combined to find a common PC1 & PC2, or were they done separately on each simulation? If so, how similar are they? The authors write "the first two principal components (PC-1 and PC-2) that capture the most pronounced C. displacements". How much of the total variance did these two components capture? The authors write the changes mostly concern loop C and loop F, but which data proves this? e.g. A plot of PC1 and PC2 over residue number might help.

      The PCA analyses have been enriched. Figure 3-Source Data 1. shows the dominance of PC1 and PC2. Because the binding energy match was sufficient to identify affinity states, we did not explore additional PCs. Residue-wise PC1 and PC2 analysis and comparison with RMSF are in Figure 2-figure supplement 2. PC1 and PC2 both correlate with fluctuations in loops C and F. Overlap analysis in different runs is shown in Figure 3-figure supplement 1. Lower variance in a particular region of the PCA landscape indicates that the system frequently visits these states, suggesting stability (a preference for these conformations).

      The authors map the -kTln rho as a free energy for each simulation as a function of PC1 & PC2. It is important to reveal how well that PC1-2 space was sampled, and how those maps converged over time. The shapes of the maps and the relative depths of the wells look very different for each agonist. If the maps were sampled well and converged, the free energies themselves would tell us the stabilities of each state. Instead, the authors do not even mention this and instead talk about "variance" being the indicator of stability, stating that m3 is most stable in all cases. While I can believe 200ns could not converge a PC1-2 map and that meaningful delta G values might not be obtained from them, the issue of lack of sampling must be dealt with. On page 12 they write "Although the bottom of the well for 3 energy minima from PCA represent the most stable overall conformation of the protein, they do not convey direct information regarding agonist stability or orientation". The reasons why not must be explained; as they should do just that if the two order parameters PC1 and PC2 captured the slowest degrees of freedom for binding and sampling was sufficient. The authors write that "For all agonists and trajectories, m3 had the least variance (was most stable), again supporting convergence by 200 ns." Again the issue of actual free energy values in the maps needs to be dealt with. The probabilities expressed as -kTln rho in kcal/mol might suggest that m2 is the most stable. Instead, the authors base stability only on variance (I guess breadth of the well?), where m3 may be more localised in the chosen PC space, despite apparently having less preference during the MD (not the lowest free energy in the maps).

      The motivations and justifications for the use of approximate PBSA energetics instead of atomistic MD free energies should be dealt with in the manuscript, with limitations more clearly discussed. Rather than using modern all-atom MD free energy methods for relative or absolute binding free energies, the author selects clusters from their identified states and does Poisson-Boltzmann estimates (electrostatic, vdW, surface area, vibrational entropy). I do believe the following sentence does not begin to deal with the limitations of that method: "there are limitations with regard to MM-PBSA accurately predicting absolute binding free energies (Genheden & Ryde, 2015; Hou et al., 2011) that depends on the parameterization of the ligand (Oostenbrink et al., 2004)." What are the assumptions and limitations in taking continuum electrostatics (presumably with parameters for dielectric constants and their assignments to regions after discarding solvent), surface area (with its assumptions and limitations), and of course assuming vibration of a normal mode can capture entropy. On page 30, regarding their vibrational entropy estimate, they write that the "entropy term provides insights into the disorder within the system, as well as how this disorder changes during the binding process". It is important that the extent of disorder captured by the vibrational estimate be discussed, as it is not obvious that it has captured entropy involving multiple minima on the system's true 3N-dimensional energy surface, and especially the contribution from solvent disorder in bound Vs dissociated states.

      As discussed above, errors in the free energy estimates need to be more faithfully represented, as fractional errors are not meaningful. On page 21 the authors write "The match improved when free energy ratios rather than absolute values were compared." But a ratio of free energies is not a typical or expected measure of error in delta G. They also write "For ACh and CCh, there is good agreement between.Gm1 and GLA and between.Gm3 and GHA. For these agonists, in silico values overestimated experimental ones only by ~8% and ~25%. The agreement was not as good for the other 2 agonists, as calculated values overestimated experimental ones by ~45%(Ebt) and ~130% (Ebt). However, the fractional overestimation was approximately the same for GLA and GHA." See the above comment on how this may misrepresent the error. On page 21 they write, in relation to their large fractional errors, that they "do not know the origin of this factor but speculate that it could be caused by errors in ligand parameterization". However the estimates from the PBSA approach are, by design, only approximate. Both errors in parameterisation (and their likely origin) and the approximate model used, need discussion.

      Again, the goal of calculating binding free energy was to identify structural correspondence to LA and HA and not to obtain absolute binding free energy values. Along with the least variance (distribution) for the principle component for m3, it also had the highest binding free energy. An association of m1 to LA and m3 to HA was done after comparing them to experimental values (efficiencies). This comparison not only validates our approach but also underscores the utility of PBSA in supplementing MD and PCA analyses with broader energetics perspectives.

      Reviewer #3 (Public Review):

      Weaknesses:

      Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for two other ligands were significantly different than the experiment. It is unclear to what extent the choice of method for the energy calculations influenced the results. See above.

      A control simulation, such as for an apo site, is lacking. Figure 2-figure supplement 2. shows the results of 200 ns MD simulations of the apo structure (n=2).

      Reviewer #4 (Public Review):

      Weaknesses:

      Timescales (200 ns) do not capture global rearrangements of the extracellular domain, let alone gating transitions of the channel pore, though this work may provide a launching point for more extended simulations. A more general concern is the reproducibility of the simulations, and how representative states are defined. It is not clear whether replicates were included in principal component analysis or subsequent binding energy calculations, nor how simulation intervals were associated with specific states.

      We are interested eventually in using MD to study the full isomerization, but these investigations are for the future and likely will involve full length pentamers and longer timescales. However, in response to this query we have in the Discussion raised this issue and offer speculations. See above, PCA has be compared between replicates (Figure 3-figure supplement 1).

      Structural analysis largely focuses on snapshots, with limited direct evidence of consistency across replicates or clusters. Figure legends and tables could be clarified.

      Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories. Incorporated in the legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study gives interesting insights into the possible dynamics of ligand binding in ACh receptors and establishes some prerequisites for necessary and urgent further work. The broad interest in this receptor class means this work will have some reach.

      Suggestions:

      (1) I found the citation of relevant literature to be rather limited. In the following paper, the agonist glutamate was shown to bind in two different orientations, and also to convert. These are much longer simulations than what is presented here (nearly 50 µs), which allowed a richer view of conformational changes and ligand binding dynamics in the AMPA Receptor. Albert Lau has published similar work on NMDA, delta, and kainate receptors, including some of it in eLife. Perhaps the authors could draw some helpful comparisons with this work.

      Yu A et al. (2018) Neurotransmitter Funneling Optimizes Glutamate Receptor Kinetics. Neuron

      Likewise, the comparison to a similar piece of work on glycine receptors (not cited, https://pubs.acs.org/doi/10.1021/bi500815f) could be instructive. Several similar computational techniques were used, and interactions observed (in the simulations) between the agonist and the receptor were tested in the context of wet experiments. In the absence of an equivalent process in this paper (no findings were tested using an orthogonal approach, only compared against known results, from perhaps a narrow spectrum of papers), we have to view the major findings of the paper (docking in cis that leads to a ligand somersault) with some hesitancy.

      The Gharpure 2019 paper is cited in the context of the delta subunit but this paper was about a3b4 neuronal nicotinic receptors. This could be tidied up. Also, the simulations from that paper could be used as an index of the stability of the HA state (if ligand orientation is being cited as transferrable, other observations could be too).

      New citations have been added. It is difficult to generalize from Yu A and Yu R eta al, because in neither study was the ligand orientation associated with LA versus HA binding energy.

      (2) "To start, we associated the agonist orientation in the hold end states as cis in AC-LA versus trans in AC-HA."

      I think this a valid start, but one is left with the feeling that this is all we have and the validity of the starting state is not tested. What was really shown here? Is the docking reliable? What evidence can the authors summon for the ligand orientation that they use as a starting structure? In addition to docking energies, the match between PBSA and electrophysiology Gs and temporal sequence (m1-m2-m3) support the assignment.

      Given that these simulations cover a circumscribed part of the binding process, I think the limitations should be acknowledged. Indeed the authors do mention a number of remaining open questions.

      Paragraphs regarding 'catch' have been added to the Discussion.

      (3) Results around line 90. Hypothetical structures and states that were determined from Markov analyses are discussed as if they are well understood and identified. Plausible though these are, I think the text should underline at least the source of such information. In these simulations, a further intermediate has been identified.

      The model in Figure 1B was first published in 2012 and has been used and extended over the intervening years. In our lab, catch-and-hold is standard. We have published many papers (in top journals), plus reviews, regarding this scheme. We made presentations that are on Youtube. Here, at the end of the Introduction we now cite a new review article (Biophysical Journal, 2024). I am not sure what more we can do to raise awareness regarding catch and hold.

      (4) The figures are dense and could be better organised. Figure 2 is key but has a muddled organization. The placement of the panel label (C) makes it look like the top row (0 ns) is part of (A). Panel B- what is shown in the oval inset (not labeled or in legend). Why not show more than one view, perhaps a sequence of time points? It is confusing to change the colour of the loops in (C). Please show the individual values in D.

      Figure 2 has been redone.

      (5) A lot is made of the aK145 salt bridge with aD200 and the distances - but I didn't see any measurements, or time course. This part is vague to the point of having no meaning ("bridge tightening").

      We present a Table of distance measurements in the SI (Figure 6-table supplement 1).

      Reviewer #2 (Recommendations For The Authors):

      All main comments have been given in the above review. There are a few other minor comments below.

      The 4 agonists examined were acetylcholine (ACh), carbamylcholine (CCh), epibatidine (Ebt), and epiboxidine (Ebx). Could the choices be motivated for the reader?

      New in Methods: the agonists are about the same size yet represent different efficiency classes (citation to companion eLife paper). One of our (unmet) objectives was to understand the structural correlates of agonist efficiency.

      The authors write that state structures generated in the MD simulation were identified by aligning free energy values with those from experiments. It would be good to explain to the reader, in the introduction, how LA and HA free energies were extracted from experiments, rather than relying on them to read older papers.

      In the Introduction, we say that to get G, just measure an equilibrium constant and take the log. We think it is excessive to explain in detail in this paper how to measure the equilibrium binding constants (several methods suffice). However, we have added in Methods our basic approach: measure KLA and L2 by using electrophysiology, and compute KHA from the thermodynamic cycle using L0. We think this paper is best understood in the context of its companion, also in eLife.

      In all equilibrium equations of the type A to B (e.g. on page 5), rather than using "=" signs it would be much better to use equilibrium reversible arrow symbols.

      It is incorporated.

      Reviewer #3 (Recommendations For The Authors):

      (1) Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for Ebt and Ebx were significantly different than the experiment. Are there any alternative methods for calculating binding energies from the MD simulations that could be readily compared to?

      See above. We did not use more sophisticated energy calculations because we already knew the answers. Our objective was to identify states, not to calculate energies.

      (2) It would be nice to see control simulations of an apo site to ensure that the conformational changes during the MD are due to the ligands and not an artifact of the way the system is set up. I am primarily asking about this as the simulation of the isolated ECDs for the binding site interface seems like it may be unhappy without the neighboring domains that would normally surround it. On that note, was the protein constrained in any way during the MD?

      Apo simulation results are presented in Figure 2-figure supplement 2. The dimer interface seems to be adequate (stable).

      (3) Figure 4A-B: Should the colors for m1 and m3 be reversed?

      Colors have been changed and a bar chart has been added.

      Reviewer #4 (Recommendations For The Authors):

      (1) Although simulations are commendably run in triplicate, it is difficult in some places to discern their consistency.

      (1a) Table S1 provides important quantification of deviations in different replicates and with different agonists. Please confirm that the reported values are accurate. All values reported for the epibatidine system are identical to those reported for carbamylcholine, which seems statistically improbable. Similarly, runs 1 and 3 with epiboxidine seem identical to one another, and runs 1 and 2 with acetylcholine are nearly the same.

      Figure 2-Source Data 1 has been corrected.

      (1b) In reference to Figure S3, the authors comment that the simulated system (one replicate with carbamylcholine) converges within 0.5 Å RMSD of a desensitized experimental structure. This seems amazing; please specify over what atoms this deviation was calculated and with reference to what alignment. It would be interesting to know the reproducibility of this remarkable convergence in additional replicates or with other ligands; for example, Figure 5 indicates that loop C transitions to a lesser extent in the context of epibatidine than other agonists.

      The comparison was for the entire dimer ECD; 0.5 Å is the result. It may be worthwhile to pursue this remarkable convergence, but not in this paper. Here, we are concerned with identifying ACLA and ACHA. Similarity between ACHA and AD structures is for a different study.

      (1c) For principal-component and subsequent analyses, it appears that only one trajectory was considered for each system. Please clarify whether this is the case; if so, a rationale for the selection would be helpful, and some indication of how reproducible other replicates are expected to be.

      We have added new PCA results (Results, Figure 3-figure supplement 1) that show comparable principal components in other replicates.

      (2) Figure 3 shows free energy landscapes defined by principal components of fluctuation in Cα positions.

      (2a) Do experimental structures (e.g. PDB IDs 6UWZ, 7QL6u) project onto any of these landscapes in informative ways?

      6UWZ.pdb matches well with the apo (7QKO.pdb), comparable to m1, and 7QL6.pdb with the m3.

      (2b) Please indicate the meaning of colored regions in the righthand panels.

      The color panels in the top left panel indicate the colored regions in the righthand panel also, which is indicative of direction and magnitude of changes with PC1 and PC2.

      (2c) Please also check the legend; do the porcupine plots really "indicate the direction and magnitude of changes between PC1 and PC2," or rather between negative and positive values of each principal component?

      It indicates the direction and magnitude of changes with PC1 and PC2.

      (3) It would be helpful to clarify how trajectory segments were assigned to specific minima, particularly m2 and m3.

      (3a) Please verify the timeframes associated with the m2 minima, reported as "20-50 ns [with acetylcholine], 50-60 ns [with carbamylcholine], 60-100 ns [with epibatidine, and] 100-120 ns [with epiboxidine]." It seems improbable that these intervals would interleave so precisely in independent systems. Furthermore, the intervals associated with acetylcholine and epiboxidine do not appear to correspond to the m2 regions indicated in Figure S8.

      Times are given in Figure 4-Source Data 1 and Figure 3-figure supplement 2. The m2 classification is based on loop displacement as well as agonist orientation. For all agonists, the selection was strictly from PCA and cluster analysis.

      (3b) The text (and legend to Figure 3) indicate that 180+ ns of each trajectory was assigned to m3, which seems surprisingly consistent. However, Figure S5 indicates this minimum is more variable, appearing at 160 ns with acetylcholine but at 186 ns with carbamylcholine. Please clarify.

      see above: the selection was from PCA and cluster analysis. Times are in Figure 3-figure supplement 2 and also in Figure 4-Source Data 1 (none in Fig. 3 legend).

      (3c) Figures 5, 6, S6, and S7 illustrate structural features of free-energy minima in each ligand system. Please clarify what is shown, e.g. a representative snapshot, centroid, or average structure from a particular prominent cluster associated with a given minimum.

      They are all representative snapshots (now in Methods). Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories.

      (4) Figure S4 helpfully shows the behavior of a pentameric control system; however, some elements are unclear.

      (4a) The 2.5-6.5 Å jump in RMSD at ~40 ns seems abrupt; can it be clarified whether this corresponds to a transition to either m2 or m3 poses, or to another feature of e.g. alignment?

      Figure 2-figure supplement 4 left bottom is just the ligand. The jump is the flip, m1 to m2.

      (4b) It seems difficult to reconcile the apparently bimodal distribution of states with the proposed 3-state model. Into which RMSD peak would the m2 intermediate fall?

      The simulations are only to 100 ns, where we found a complete flip of the agonist represented in the histograms. This confirmed that dimer showed similar pattern as the pentamer. In depth analysis was only done only on dimers.

      (4c) The top panel is labeled "Com" with a graphical legend indicating "ACh." Does this indicate the ligand or, as described in the text legend, "the pentamer" (i.e. the receptor)? For both panels, please verify whether they are calculated on the basis of center-of-mass, heavy atoms, Cα, etc.

      "Com" (for complex) has been changed to system (protein+ligand).

      (5) Minor concerns:

      (5a) In Figures 1 and S3, correct the PDB references (6UWX and 7QL7 are not nAChRs).

      They are now corrected.

      (5b) In Figure 4, do all panels represent mean {plus minus} standard deviation calculated across all cluster-frames reported in Table 1?

      Yes.

      Also check the graphical legend in panel A: presumably the red bars correspond to m1/LA, and the blue to m3/HA?

      Corrected

      (5c) In the legend to Figure S1, please clarify that panel B is reproduced from Indurthi & Auerbach 2023.

      This figure has been deleted.

      (5d) As indicated in Figure S2, it seems surprising that the RMSF is so apparently low at the periphery, where the subunits should contact neighbors in the extracellular domain; how might the authors account for this? Specify whether these results apply to all replicates of each system.

      The redness in the periphery for all four systems indicates the magnitude of fluctuation. As we focus on the orthosteric site, we highlight the loops around the agonist binding pocket and kept other regions 75% transparent. We now include Apo simulations and the dimer appears to be stable even without an agonist present.

      (5e) Within each minimum in Figure S5, three "prominent" clusters appear to be colored (by heteroatom) with carbons in cyan, pink, and yellow respectively. If this is correct, note these colors in the text legend.

      Colors have been added to the legend.

      (5f) In Figure S6, note in the legend that key receptor sidechains are shown as spheres, with the ligand as balls-and-sticks, and that ligand conformations in both low- and high-affinity complexes are shown in both receptor states for comparison.

      This is now added in the legend.

      (5g) The legend to Figure S6 also notes "The agonists are as in Fig S4," but that figure contains a single replicate of a different system; please check this reference.

      This has been updated to Figure 5.

      (5h) In Figure S8, the colors in the epibatidine system appear different from the others.

      The colors are the same for m1, m2 and m3 in all systems including epibatidine.

      (5i) In Table 1, does "n clusters" indicate the number of simulation frames included in the three prominent clusters chosen for MM-PBSA analysis? Perhaps "n frames" would be more clear.

      It was a good suggestion. It has now been changed to ‘n frames’

      (5j) Pg 24-ln 453 presumably should read "...that separate it from m1 and m3..."

      This sentence is now changed in the discussion.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We thank you very much for the positive evaluation of our manuscript and your encouragement to continue in this fascinating topic. In this version we made minor changes in the text to address the comments and suggestion of the second reviewer and increase the clarity of the text.

      Reviewer #2 Recommendation to the authors

      We thank the reviewer for the sharp comments that help us improve the clarity of the paper. Below we list the changes we made to correct and revise the paper in accordance to the reviewer’s comments.

      (1) Line 90. Isn't the genus Paracentrotus?

      Yet it is, thank you. We corrected the typo.

      (2) Figure 1 and supplementary figure 2. To this reviewer supplementary Figure 2 doesn't really help the story as written in the paragraph from line 96-110. You want to report expression of ROCK in skeletogenic cells. You do that quite well in Figure 1. Since Fig. S2 reports whole embryo expression of ROCK when only 5% of the cells in the embryo are the subject of interest here, and the Axitinib is selective, presumably for skeletogenic cells, the relative lack of effect in Fig. S2 is not surprising and again, doesn't really help the theme you wish to establish by focusing on the role of ROCK in skeletogenic cells over time. If anything, the data reported in Fig. S2 shows that perturbation of VEGF signaling has very little effect embryo-wide, while Fig. 1 shows that perturbation of VEGF signaling has a noticeable effect on ROCK expression in skeletogenic cells. If you choose to keep Fig. S2, I recommend that you indicate that embryo-wide vs skeletogenic cell difference more succinctly than given at present. It will also strengthen your paragraph in lines 110-127.

      The importance of the western blot presented in Fig. S2 is to validate that the antibody recognizes a protein of the expected size. This strengthen the credibility of this commercial antibody to detect the sea urchin ROCK protein. We agree with the reviewer that the fact that the skeletogenic cells are less than 5% of the embryonic cells is important to explain why we didn’t see an affect of VEGFR inhibition in the western blot, and we changed the text to express it (lines 108-111): “Yet, this measurement was done on proteins extracted from whole embryos, of which the skeletogenic cells, where VEGFR is active, are less than 5% of the total cell mass (42). We therefore wanted to study the spatial expression of ROCK and specifically, its regulation in the skeletogenic cells.”

      (3) Comparison of Fig. 2 and Fig. S3. To me the reader is confused when Fig. S3 is 33hpf as reported in the text (but not in the figure legend), and Fig. 2 shows 2 day old embryos - on the figure and figure legend but not in the text. So, the reader sees the text indicating 33hpf and looks around and the figure 2 says 2dpf. Does that mean 33hpf = 2dpf, the reader is thinking. To clarify, I suggest including the 2dpf in the text or simply drop the time in the text and report it in the two figures. Further, in the middle of the paragraph 130-143 you switch from reporting on Fig.S3 to Fig. 2, yet the reader doesn't know that. The reader is still looking at Fig. S3. The problem here is that at 33hpf the skeleton doesn't yet show the reduction or abnormalities that are shown later at 2dpf in Fig. 2. In clarifying this paragraph both the reduction in ROCK expression and the subsequent alterations in growth and patterning of the skeleton will be clear to the reader.

      Thank you for raising this point. We added in the caption of Fig. S3 that the measurements were done in 33hpf. We also added in the text, that the observations of the skeletogenic phenotypes were done at 2dpf (48hpf). We made a break between the first paragraph discussing Fig. S3 and the paragraph discussing Fig. 2.

      (4) The experiment with Y27632, an inhibitor of ROCK, is significantly improved in this revision. The concern earlier was the possibility that at the concentration used there might be off-target effects since other kinases are affected by higher concentrations of this selective inhibitor. The authors have modified this component of the paper and performed experiments at lower concentrations where other reports indicate the inhibitor is highly selective for ROCK, and they still demonstrate an inhibition of skeletal production. This, plus the added citations greatly increases confidence that this inhibition is selective for ROCK, thus enabling a stronger conclusion that ROCK has a role in skeletal growth and patterning.

      Thank you for asking us to test this lower concentration which improved the credibility of our findings.

      Line 239 - should be: indicating instead of indicting We corrected that.

      (5) Line 402-403."The first step in generating the sea urchin spicules is the construction of the spicule cavity, a membrane filled with calcium carbonate and coated with F-actin (Fig. 8A)". I suggest more precise language. The way this now reads (above) is that somehow the spicule cavity is a membrane and that membrane is filled with CaCO3. And further the membrane is coated with F-actin. Isn't the spicule cavity what is filled with CaCO3? And isn't that cavity surrounded by a membrane? And the F-actin must be in the cortex of the cell since there is very little cytoplasm associated with the pseudopodial extensions that surround the spicule.

      We change this sentence to: “The first step in generating the sea urchin spicules is the construction of the spicule cavity where the mineral is engulfed in a membrane coated with F-actin” (lines 403-404). Our observations show that F-actin is enriched around the spicule cavity. It could be an extension of the cell cortex, but we did not prove it, so we prefer to simply describe what we saw.

      Line 405-408. Thank you for putting in this unknown. It is important to point out that while you've shown that ROCK contributes to regulation of actomyosin, it is not clear whether this is direct or indirect. You have also shown that ROCK somehow contributes to regulation of the GRN that leads to skeletogenesis. Thus, your data are consistent in showing that ROCK perturbation cripples normal skeletogenesis both via morpholino and with a selective inhibitor. Your last part of the discussion then offers speculation as to what might be affected specifically. That discussion sets the stage for digging even deeper to identify specific targets of ROCK activity.

      Thank you, we agree with you that there is an exciting road ahead of us!

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We will make the suggested change of adding “sequence”. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. And the example given is not helpful to ascertain what other changes may be necessary, since we cannot see any problem with the sentence “we assembled a chromosome-scale genome of” as this phrase is widely used in many similar publications.

      As for panel E of figure 2, it is not missing. The panel located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      • Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced, individual tissues derived from juvenile fish were used for the genome annotation and whole larval fish for the developmental analysis. We will specify in the figures and text that the results shown are from whole larvae, and add more detail to the material and methods section about which type of sample was analysed in which way.

      • The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We will upload the code used to assemble and annotate this genome to a public repository or add it to the supplementary material.

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material. The annotation workflow has been described in a previous publication already, but an in-depth description can also be found in the Material and methods section, including parameters used for specific steps. The RNA-seq mapping and analysis part has also been described in the Material and Methods section, including parameters and models for DESEq2.

      • Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications. This would really help (general) readers.

      We will add a comment on this in the manuscript as suggested. Basically, the T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormone-mediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      • Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      As the reviewer notes, there are a large number of differentially expressed genes due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60.

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). Normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/): “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods; an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the possible interaction of corticoids and TH during metamorphosis, a question that is certainly not settled yet in teleost fishes, and 2) the existence of an early larval developmental step also involving TH and corticosteroids.

      Especially 2) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of our study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remain a contentious issue.

      We wrote that “there is contrasting evidence of communication between these two pathways [TH and corticosteroids] in teleost fish with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish (ref 19), golden sea bream (ref 100) and silver sea bream (ref 101). Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder (ref 20). TH exposure increases MR and GR genes expression in zebrafish embryo (ref 55). It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner (ref 56) On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp, decreasing cortisol levels (ref 57), whereas cortisol exposure decreases TH levels in European eel (ref 58). Given this scattered evidence, the existence of a crosstalk active during teleost metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis and cortisol synthesis are activated (i) during early development and (ii) during metamorphosis. This may suggest that in some aspect cortisol synthesis can work in concert with TH, as has been shown in several different contexts in amphibians (ref 17).” In the revised manuscript, we will also add the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field.

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiment in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period.

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. We will make this clear in the revised manuscript.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption, 2) that there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) that we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. To reinforce our point, we plan to add a figure to the revised manuscript, which puts our data in the context of earlier studies done in grouper. This will clearly show that our inference is correct. Additionally, we would like to point out that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. We will make sure to improve the clarity of the revised version of the manuscript to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here.

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We will improve the clarity of the manuscript to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared line 195 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction.

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We will correct this in the revised version.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). We are planning to add a figure reconciling all these data together. However, the main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we will pay attention to improve that part too.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message is unclear, which we will address in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We will also make sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.